Backtesting strategies on historical tick data
1. Where to get the data
Three quality tiers for backtest data, in order of cheap-and-rough to gold-standard:
- Public minute candles (free). Available from CoinGecko, exchange APIs (rate-limited), or a CSV dump. Fine for sanity-checking a directional thesis. Useless for execution-cost sensitive strategies.
- Trade-tick data (cheap or free). Every executed trade with timestamp, price, size, side. Exchange-published in chunked files. Captures price formation but not liquidity (you don't know what didn't trade).
- Full L2 order-book reconstruction (expensive). Every order add, modify, cancel, fill. Lets you simulate placing your own orders against the real book. Required if your strategy is sensitive to slippage. bitexasia publishes free L2 archives down to a small monthly volume threshold; commercial vendors charge $50–500/month for this.
2. Look-ahead bias
The single most common mistake in backtests. Look-ahead happens when your strategy uses information that wouldn't have been available at the time of the decision.
The obvious version
Computing a rolling 20-period mean and using it on the same row. The mean uses the current row's value to compute itself.
The subtle version
Using exchange's published "open" price for a 1-minute candle as your entry price. Real entry would have been somewhere inside that minute, not at the open. If your strategy is "buy at open if some condition," your real fills are systematically worse.
The deeply hidden version
Survivorship bias. You backtest on the current top-100 coin list, which excludes everything that delisted. Strategies that pick from "the top 100" historically benefit from never holding the next FTX. Paid vendors like Kaiko or CryptoCompare include delisted assets in their archives — pay for that, don't fake it.
3. Execution simulation
Even with perfect data and no look-ahead, your simulated fills are optimistic if you assume you'd have hit the touch every time. Two corrections:
- Slippage model. For market orders, sweep through the book at the time of the order. For limit orders, only fill when the touch crosses your price and there's enough volume after you to fill your size.
- Latency model. Assume your order arrives 50–250 ms after you decided to send it. The book has moved by then. If your strategy edge dies inside that window, the strategy doesn't have edge.
4. Metrics that matter for crypto
Sharpe ratio is fine, but four others matter more for crypto in particular:
- Maximum drawdown. How deep is the worst peak-to-trough decline? Crypto strategies routinely show 60-70% drawdowns even when their long-run Sharpe looks good. If you can't psychologically tolerate the drawdown, the strategy isn't yours.
- Time-in-drawdown. Worse than the depth is the duration. A strategy that recovers in 30 days from −40% is liveable; one that takes 18 months is not.
- Calmar ratio. Annualised return / max drawdown. A 30% return with 60% max drawdown is half as good as a 20% return with 20% max drawdown.
- Tail risk vs. funding cost. Holding leveraged positions costs you funding rate + occasional tail-event liquidation. Net, not gross, return.
5. Out-of-sample validation
Split your data: train on the first 70%, validate on a held-out 30%. Tune parameters on train. Test on validation only once. If validation performance is markedly worse than train, you overfit. Re-think and start over — don't tune on validation.
Walk-forward analysis is even better: roll the train/validation window forward through time, retraining at each step. Lets you see how strategy performance evolves as market regimes change.
6. Paper-trade before live
Before risking capital, run the strategy in our sandbox API environment for at least a month. The sandbox uses live price feeds with simulated fills — closes the gap between backtest and reality without risking real money.
For order types and execution mechanics see the Spot Trading and Margin Trading pages.