Calibration loop

How wrong is our backtest? We measure it.

Every night, the adaptive engine compares its backtest predictions against actual forward-test results for every strategy with 8 or more live trades. When a gap shows up, we diagnose it (slippage, entry timing, exit execution, regime mismatch, stale data) and update the model. Every adjustment is logged here.

Most backtesters do not publish this. The published numbers tend to come from the backtest, not from how the backtest compares to live execution. We do both and publish the gap.

Live calibration state

4 gap measurements logged · tracking since 4/23/2026

Entry delay (bars)

0.0 bars stable

Number of bars added to entry timing in backtests to simulate the cron-cycle delay between signal and real-money execution. 0 = no delay needed.

Sample size: 0·Confidence: 0%·Updated 4/19/2026, 8:12:07 AM

Slippage multiplier

0.80x0.81x drifting down

How much real fill prices differ from backtest assumed prices. 1.0 = backtest matches reality. >1.0 = real slippage is worse than backtest assumed (the backtest was too optimistic).

Sample size: 2·Confidence: 10%·Updated 4/25/2026, 12:00:02 AM

Stop overshoot %

15.00% stable

Average distance real stop fills miss the stop level by, as % of price. Used to calibrate stop-loss assumptions in backtests so reported P&L matches reality.

Sample size: 200·Confidence: 80%·Updated 4/19/2026, 1:30:22 AM

TP tightening factor

1.00x stable

How much we tighten the take-profit threshold in backtests to match real-world execution. 1.0 = no adjustment needed. >1.0 = real fills happen later than backtest assumed.

Sample size: 0·Confidence: 0%·Updated 4/19/2026, 8:12:07 AM

Win-rate offset

2.00%4.00% drifting down

Forward-test win rate minus backtest predicted win rate, averaged across calibrated genes. Positive = forward beats backtest. Negative = forward underperforms backtest predictions.

Sample size: 2·Confidence: 10%·Updated 4/25/2026, 12:00:02 AM

Performance by context

Where the engine actually performs vs where it does not. Sliced from 35 live v2 trades (overall avg +0.65% per trade). The CI-lower column is the pessimistic edge of a 90% confidence interval. Small slices with high variance show negative CI even when the average is positive, which is the correct read. As the dataset grows, per-slice gap-vs-backtest calibration becomes available.

By coin

SliceNAvgCI low
ETH16+0.19%-0.33%
BTC7+1.23%+0.35%
XRP7+0.32%-0.11%
BNB3+1.70%+0.45%
SOL2+1.85%-0.79%

By 1h regime

SliceNAvgCI low
ranging high vol12+1.25%+0.63%
strong trend med vol6-0.12%-0.28%
ranging low vol5+0.70%+0.20%
unlabeled4+2.02%+0.77%
weak trend med vol3+0.70%+0.70%
trending high vol2-1.35%-1.54%
weak trend low vol2-0.37%-0.54%
trending low vol1-1.83%-1.83%

By timeframe

SliceNAvgCI low
5m19+1.23%+0.89%
1H9+0.03%-1.04%
15m7-0.14%-0.28%

Predictions tracked across the product

The calibration loop above is one self-correction system. We are extending the same pattern (claim, observe, gap, diagnose, act) to every system that makes a prediction or claim: the /prove verdict, the AI verdict text, the reasoning narratives, the reliability score, and so on. Each row is a domain we are now instrumenting publicly. Resolution takes time: most loops have a 30+ day window between claim and outcome, so most rows below will show 0 resolved for the first month.

DomainTotalResolvedPendingEarliest pendingNext resolves
prove_verdict2024/25/20265/25/2026
ai_verdict_text1014/25/2026

Schema, helper library, and full domain list at src/lib/predictions.ts in the codebase. Adding a new self-correction loop is a single recordPrediction() call.

Recent gap measurements

slippage driftF_price_cross_sma_200_bull_tf604/25/2026

Across 9 forward trades the live P&L came in -0.704% vs backtest, with win rate 33% (vs 89% predicted). Diagnosed as slippage drift: backtest was too optimistic on fill prices. Action: adjust_backtest.

minor gapE_A_rsi_7_below_25_b150_tp0.5_b800_adx304/25/2026

10 forward trades show a small gap (+0.964% P&L, +13.4pp win rate). Within noise; no calibration adjustment needed.

minor gapE_A_rsi_7_below_25_b150_tp0.5_b800_adx304/24/2026

10 forward trades show a small gap (+0.964% P&L, +13.4pp win rate). Within noise; no calibration adjustment needed.

minor gapE_A_rsi_7_below_25_b150_tp0.5_b800_adx304/23/2026

8 forward trades show a small gap (+0.966% P&L, +13.4pp win rate). Within noise; no calibration adjustment needed.

Methodology

Trigger.The calibrator runs at 02:00 UTC daily. For every active gene with 8 or more forward-test trades, it pulls the original backtest result and the live trade history and computes the gap on five dimensions: average P&L per trade, win rate, take-profit hit rate, time-exit rate, and average bars held.

Diagnosis.Gaps are categorized by signature. P&L drops with similar win rate suggests slippage drift. Win rate drops suggest entry timing problems (cron delay, gate misfire). Lower TP hit rate suggests exit execution lag. Pattern mismatch with regime suggests the backtest tested a different market structure than current.

Action.Slippage drift updates the global slippage multiplier, which feeds the next round of backtests. Entry timing adds entry-delay bars to backtest assumptions. Regime mismatches narrow the gene's activation window rather than adjusting the global model.

Honesty principle. When the gap is positive (forward beats backtest, like in the recent RSI(7) entries) we log it as a minor gap and take no action. Being right by accident is still being miscalibrated. Same scrutiny in both directions.

See how the calibrated engine actually trades: