Why your crypto backtest showed +45% and the bot lost money

9 min read

A typical retail backtest shows a strategy returning +45% a year and that same strategy loses money in live trading. The backtest isn't lying in an obvious sense. It's lying in six subtle ones, each of which has a specific fix. Here they are.

1. Zero fees or ludicrously low fees

TradingView's strategy tester defaults to zero commission. Most Pine scripts people copy online don't set a commission either. A strategy that turns over 50 trades a year at Binance taker fees (0.1% per side, 0.2% round-trip) pays 10% of its notional just to touch the book. If your backtest shows +45%, your real strategy does +35% after fees, and that is before spread and slippage.

The fix is obvious: set the fee to match the exchange you'll actually trade on. Binance spot is around 0.1% per side. Binance perp taker is 0.04%. Kraken maker is 0.16%. Coinbase Advanced is 0.6%. Your backtest needs to assume the worst case you'll actually experience, not the best case.

2. Same-candle execution

A lot of backtest engines enter and exit within the same candle. If the 4h candle opens at 100 and closes at 104 and your strategy's entry signal fires at the open, the backtest records a fill at 100 and the exit checks if the target was hit before the close. If the high of that candle was 106, the backtest counts a win.

In live markets you can't enter at the open of a candle that hasn't closed yet. Your signal fires when the candle closes. By the time your order arrives, the price has moved. Most backtest-to-live divergence I see is this single bug: the backtest is filling at prices that weren't available when the signal was actually generated.

Fix: your signal engine should only evaluate on confirmed, closed candles. Entry price should be the next candle's open, not the current candle's close. Stops and targets should check high and low of subsequent candles in chronological order, not assume best-case fill.

3. Lookahead bias

This is the subtle one. Your indicator uses future data to compute its value at a historical bar. The classic example: you backtest an indicator that uses a smoothing function like a centered moving average. At bar 100 the centered MA depends on bars 95 to 105. Your backtest sees the value before bar 105 has happened. In live trading, you can't compute it until bar 105 closes, by which point the trade is already over.

TradingView's own support docs have an entire article titled “Strategy produces unrealistically good results by peeking into the future.” If your Pine script uses security() with a non-trivial lookback, or a function that references values with negative offsets, this can happen silently.

Fix: walk every indicator and confirm it only reads from past bars. If a value at bar i depends on anything after bar i, it's broken.

4. Selection bias on the window

Three years of BTC data contains a massive bull run and one or two structural corrections. If your strategy was tested only on 2023-2024, it will look great because the asset went up and almost every long strategy worked. If it was tested only on 2022, it will look terrible because BTC dropped 65%.

The tempting thing to do is pick the window that makes the strategy look best. The honest thing is to test across multiple full cycles and report the worst-case performance, not the best. If your strategy requires a specific market regime to work, it isn't a strategy, it's a bet on that regime continuing.

5. Curve-fitting

You adjust your RSI threshold from 30 to 28, the backtest goes from +40% to +52%. You adjust again to 27, it becomes +58%. Eventually you land on 26.3 and the strategy shows +87%. You have not improved the strategy. You have fit it to the noise in your specific training data.

Any strategy that has more than two or three tunable parameters, or that was “optimized” by trying many parameter combinations, is at serious risk of curve-fitting. The more parameters, the more shape the strategy has to mold itself to historical noise that won't repeat.

The only real defense is holding out data the strategy has never seen during tuning. Which brings us to the next one.

6. No out-of-sample test

If you tune your strategy on the same data you test it on, you learn nothing about whether it generalizes. You learn only that you can fit the past.

Walk-forward validation is the fix. Split your data in half. Tune everything, including entry thresholds, exit rules, filter conditions, on the first half. Then run the fully-frozen strategy on the second half. If it still works, you have evidence of an actual edge. If it doesn't, you curve-fit.

I wrote a separate piece on walk-forward validation explained for non-coders if you've never done one.

What “honest” looks like in practice

A backtest that survives the above checklist will almost always show a smaller number than the one that didn't. If your clean backtest shows +0.3% per trade with walk-forward validation and real fees, that is often a real edge. If your pretty backtest showed +45% per year and your clean one shows -0.2% per trade, the 45% was noise.

The uncomfortable truth is that most retail strategies don't survive. In our own research lab we've tested over 54,000 candidates and about 3.1% pass basic honesty gates. That isn't because our gates are too strict. It's because simple crypto strategies, by and large, don't have a real edge after costs. The pretty backtests you see on YouTube are almost all somewhere in the list of six above.

Want to run your strategy through all six checks at once?

We do every one of the above by default. Walk-forward, real fees, confirmed candles, the works.

Test my strategy →

Free. No signup. Verdict in under 2 minutes.