Methodology

Hypothetical vs. Live: The Performance Gap

Backtests are easy to make beautiful and easy to make meaningless. Here is why most published results overstate reality — and what an honest track record requires.

Niro ResearchMay 27, 202610 min read

Almost every trading product advertises a backtest. Almost none survive contact with live markets unchanged. The gap is not bad luck — it is a predictable consequence of how backtests are built and selected.

Overfitting is the default, not the exception

When you try many strategy variants and keep the best, you are selecting on noise. Formal work on the probability of backtest overfitting^[1] shows that with enough trials, an outstanding in-sample result is expected even when the true edge is zero — and that such strategies tend to underperform out of sample.

Figure 1. The in-sample / out-of-sample divergence (illustrative) — Typical shape of an overfit strategy^[1]; values illustrative.

Why regulators single this out

The asymmetry is well understood by regulators. CFTC Regulation 4.41^[2] requires that hypothetical performance carry a prominent disclosure precisely because it “is prepared with the benefit of hindsight” and does not reflect real execution, liquidity, or the financial and emotional pressure of live trading.

A backtest tells you what would have worked. Only a live, cost-adjusted record tells you what does.

What an honest record requires

The academic standard for asset-return claims is strict about out-of-sample evidence and multiple-testing corrections^[3]. Translated into product terms, an honest track record has to do four things:

Test out of sample. Judge a strategy on data it never saw in design. Cost-adjust. Net of modeled commissions and slippage. Gate on significance. No claim until the sample is large enough to mean something. Reconcile to live. Compare hypothetical to realized fills.

Figure 2. How presentation inflates a number (illustrative) — Same strategy, three ways of reporting it; conceptual.

This is the methodology behind Niro’s Proof Engine: it tests strategies on real data, reports results net of costs, and withholds any marketing claim until a strategy clears a significance threshold — so what reaches you is evidence, not an overfit highlight reel.

References

Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014). The Probability of Backtest Overfitting. Journal of Computational Finance, 20(4).
U.S. Commodity Futures Trading Commission. Regulation 4.41 — Advertising by commodity pool operators, commodity trading advisors. (Hypothetical performance disclosure.)
Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.

Educational research, not investment advice or a recommendation to buy or sell any instrument. Figures labeled illustrative are conceptual and do not represent actual results. Verify all primary sources before relying on them.

See the live track record Open the app

More research

The 0DTE Boom: What the Data Actually Shows Why Risk Management Beats Prediction