Backtesting a strategy

Run a strategy against historical data in the Backtest Studio — pick a strategy, symbol, and dates, read the 70/30 result, and learn what the numbers can and cannot tell you.

A backtest runs one strategy against historical price data and reports how it would have behaved. It is the first real test in Forven's pipeline and the gate every strategy passes before anything else looks at it. This page covers the Backtest Studio — how to set up a run, what the engine does, and how to read the result without fooling yourself.

The Studio is for users building and screening strategies. When you want to harden a survivor with the full robustness battery, that happens in the strategy lab; a single backtest is the entry point to it.

Forven is a research tool. A backtest describes past behaviour on historical data. The numbers are illustrative, they are not predictive, and nothing here is financial advice.

What a backtest is

The engine (backtest_strategy) loads historical OHLCV candles for your symbol and timeframe, runs the strategy's signal logic bar by bar (or vectorized, where supported), simulates the resulting trades with realistic costs, and computes a block of metrics from the trade ledger. Every run is persisted to the backtest_results table with its full per-trade ledger, so results are auditable later.

One detail matters more than any other: the data is split 70/30 into an in-sample (IS) window and an out-of-sample (OOS) window.

  • In-sample (IS) — the first 70%. The strategy's signals are tuned against this. IS numbers are optimistic and do not predict live behaviour.
  • Out-of-sample (OOS) — the last 30%, unseen during signal generation. OOS is the ground truth the promotion gates read, and the block you should read first.

A run returns three blocks: in_sample, out_of_sample, and robustness. Start with out_of_sample.

The Backtest Studio

The Studio lives at /backtest/new. It is a no-code and code builder with three ways to define a strategy:

  • Prebuilt — pick a bundled strategy from the built-in catalog (rsi_momentum, bollinger, keltner, macd, ema_cross, and 70+ more).
  • Visual rule builder — assemble entry/exit rules without writing code (the rule_engine path).
  • Custom code — upload a Python strategy that implements the BaseStrategy interface. See writing custom strategies for the contract and the safety guard that screens uploaded modules.

Whichever source you choose, you then set the symbol, timeframe, and date range, edit parameters, optionally preview the signals, and submit.

Steps

  1. Open /backtest/new.
  2. Choose a strategy source: select a prebuilt strategy, build rules visually, or upload custom Python code.
  3. Enter the symbol (for example BTC or BTC/USDT), the timeframe (1h, 4h, 1d), and a date range or bar count.
  4. Edit parameters in the parameter editor if you want to deviate from the strategy's defaults.
  5. Click Preview signals to dry-run the entry/exit logic without executing trades — a quick sanity check that the strategy fires at all.
  6. Click Submit to queue the backtest.
  7. When it completes, open the result on /backtest/{id}, or find the strategy in the lab and read it from the backtests tab of /lab/strategy/[id].

What you'll see

The result page renders a candlestick chart with entry and exit markers, regime shadings color-coded by market regime (TREND_UP, TREND_DOWN, RANGE_BOUND, HIGH_VOL), a trades table, and a metrics panel showing the in_sample and out_of_sample blocks side by side. A strategy that looks excellent IS and falls apart OOS is overfit — that contrast is the point of showing both.

If you uploaded an unsupported strategy type, the chart still draws candles and trades but omits indicator panels and logs a warning. Supported chart indicator types include rsi_momentum, bollinger, keltner, macd, ema_cross, stochastic, vwap, and supertrend.

Costs: fees, slippage, leverage, funding

A backtest's metrics are reported after costs — there is no separate gross/net toggle to forget. The defaults applied to each round-trip:

CostConfig keyDefault
Feebacktest_fee_bps4.5 bps
Slippagebacktest_slippage_bps2.0 bps
Leverage(per-backtest)3x
Fundingbacktest_include_fundingon

Fees and slippage are charged as a round-trip cost (entry plus exit). Funding costs use HyperLiquid hourly rates merged into the backtest frame. On a fresh install, the first backtest auto-backfills missing funding history from the exchange, so funding-aware results self-heal over time. Two flags record what actually happened — funding_applied (any trade carried funding) and funding_complete (every trade had complete funding data) — and the gates may reject a run where funding_complete is false rather than trust a funding-blind window.

Default window and timeframe

When you don't specify otherwise, the engine falls back to config defaults:

KeyMeaningDefault
backtest_duration_daysDays of history to test30
backtest_timeframeCandle timeframe1h

These are starting points, not recommendations. A 30-day, 1h window is fine for a first look, but several metrics are statistically weak on short runs — see the reliability flags below.

Execution controls (risk knobs)

Stop-loss, take-profit, trailing stops, position sizing, and similar risk knobs are not read from a strategy's params. They must be passed separately through execution_controls. This is a deliberate safety boundary (internally "B-4"): a stop_loss_pct value sitting in a strategy's params is silently ignored by the backtest engine, so a risk field cannot accidentally change backtest behaviour while meaning something different in paper or live.

The controls the engine honors when passed explicitly include stop_loss_pct, take_profit_pct, trailing_stop_pct, time_stop_bars, fixed_size, risk_per_trade, atr_stop_multiplier, daily_loss_cap, and max_concurrent_positions. If you set a stop in params and see no effect in the result, this is why.

Regime gating

Strategies can declare a compatible_regimes list. When they do, the backtest pre-computes the market regime for every bar and blocks entries outside the compatible set, forcing an exit if the regime shifts mid-trade. This keeps a trend strategy from being scored on its behaviour in a chop it was never meant to trade.

A practical caveat: regime classification needs roughly 210 bars of warmup. On windows shorter than that, bars default to RANGE_BOUND, which can distort the by_regime breakdown — another reason to give a backtest enough history.

Reading the result honestly

The metrics panel exposes 30+ figures. Three habits keep you out of trouble:

  • Read OOS, not IS. In-sample numbers are tuning artifacts. The promotion gates only read out-of-sample; so should you.
  • Honor the reliability flags. sharpe_is_reliable is false under 20 trades; annualized_return_reliable is false under 3 months. A spectacular Sharpe on a dozen trades, or a four-digit annualized return on a three-week window, is noise. Forven suppresses these in displays rather than show them with false confidence.
  • Infinite profit factor is real. A strategy with wins and zero losses reports profit_factor = inf plus profit_factor_is_infinite. It almost always means too few trades, not a flawless edge.

The full metric-by-metric reference, including the by_side and by_regime breakdowns, lives on the metrics page.

What a backtest does not prove

A single 70/30 backtest is one test against one slice of history. It does not establish robustness. That is the job of the gauntlet — walk-forward analysis, parameter jitter, cost stress, and regime splits — run from the strategy lab. A backtest tells you a strategy is worth testing further; the lab tells you whether it survives.

Backtesting also corresponds to a real lifecycle stage. In Forven's pipeline a strategy moves researching → backtesting → quick_screen → gauntlet → paper → live. A clean backtest that clears the quick_screen overfitting guardrails advances toward the gauntlet; a strategy whose numbers don't hold up is killed early, by design.

Running a backtest via the API

You can submit runs without the UI. The local API listens on 127.0.0.1:8003.

# Submit a backtest
curl.exe -X POST http://127.0.0.1:8003/api/backtesting/run `
  -H "x-api-key: $env:FORVEN_API_KEY" `
  -H "content-type: application/json" `
  -d '{ "strategy_type": "rsi_momentum", "asset": "BTC", "timeframe": "1h" }'

# Retrieve the full ledger + metrics by result id
curl.exe http://127.0.0.1:8003/api/backtesting/results/<result_id> `
  -H "x-api-key: $env:FORVEN_API_KEY"

The response carries the in_sample, out_of_sample, and robustness blocks. Related endpoints: POST /api/strategies/optimize (grid search plus walk-forward), POST /api/strategies/walkforward (robustness verdict), GET /api/strategies/list (every registered strategy), and POST /api/backtest/chart (the chart context behind the result page).

Process isolation

By default each backtest runs in a spawned subprocess with a timeout (base 60s plus 8s per 1,000 bars, capped at 300s), so a misbehaving strategy can't freeze the system. This is governed by FORVEN_BACKTEST_PROCESS_ISOLATION (on by default). Very large windows can still time out despite adequate hardware — narrow the range or coarsen the timeframe if a run never returns.

Caveats

  • Trust OOS. In-sample metrics are optimistic by construction.
  • Mind the minimums. Short windows trigger reliability flags and default regime classification; give a backtest enough history.
  • Costs are baked in. Fees, slippage, and funding are already in the numbers — don't double-count them.
  • Risk knobs live in execution_controls, not params. A stop in params is ignored by the engine.
  • One backtest is not robustness. Promote nothing on a single run; that is what the gauntlet is for.
  • Past is not prologue. Every figure describes historical behaviour on historical data. It is illustrative, not predictive, and nothing here is financial advice.