Methodology

When Is a Trading Edge Real? The Statistics of Proof

Test enough strategies and one will look brilliant by pure chance. Separating skill from luck is a statistics problem with known — and unforgiving — answers.

Niro ResearchMay 30, 20269 min read

Run a thousand random strategies and the best one will have a spectacular track record — by luck alone. This is the central trap of quantitative trading, and it has a precise statistical shape: when you test many hypotheses, the chance of a false “winner” explodes^[1].

Harvey, Liu, and Zhu showed that a large fraction of published return “factors” fail once you correct for how many were tried^[1]. The same logic indicts most strategy backtests: the more variants searched, the higher the bar a genuine edge must clear.

Figure 1. More trials raise the bar (illustrative) — The significance threshold rises with the number of strategies tested^[1]; conceptual.

The deflated Sharpe ratio

The fix is to deflate a result by the number of trials and the non-normality of returns, and to compute the minimum track-record length needed for confidence^[2]. A high Sharpe from one of five hundred attempts is not the same as a high Sharpe from one honest test — and the math says so.

The more strategies you try, the higher the bar a winner must clear before it deserves to be believed.

Significance as a gate, not a footnote

The probability of backtest overfitting^[3] gives a way to estimate how likely an in-sample star is to disappoint live. The disciplined response is to treat significance as a gate: small samples are labeled as gathering data, never sold as proof.

Figure 2. False-discovery risk grows with trials (illustrative) — Conceptual depiction of multiple-testing inflation^[1].

Niro’s Proof Engine bakes this in: results are tested on real data, reported net of costs, and withheld from any claim until they clear a significance threshold. Statistics first, marketing never.

References

Harvey, C. R., Liu, Y., & Zhu, H. (2016). … and the Cross-Section of Expected Returns. Review of Financial Studies, 29(1).
Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio. The Journal of Portfolio Management, 40(5).
Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014). The Probability of Backtest Overfitting. Journal of Computational Finance, 20(4).

Educational research, not investment advice or a recommendation to buy or sell any instrument. Figures labeled illustrative are conceptual and do not represent actual results. Verify all primary sources before relying on them.

See the live track record Open the app

More research

The 0DTE Boom: What the Data Actually Shows Why Risk Management Beats Prediction