Evolution & recalibration

How Forven evolves survivors — the autonomous promotion pipeline, the diversity guard against echo chambers, and non-destructive regime-aware parameter recalibration.

Discovery invents ideas. Evolution decides which of them earn more of your attention, and keeps the survivors honest over time. This page is for operators running the autonomous loop: it covers how a strategy climbs the pipeline, how Forven stops the research pool from collapsing into one repeated idea, and how it re-tunes deployed strategies when the market changes underneath them.

Three mechanisms do the work:

The strategy evolution pipeline — the autonomous lifecycle that walks a candidate from backtest through paper to live, and retires it when it decays.
The diversity guard — saturation detection that prevents the ideation loop from minting the same strategy family over and over.
Regime-aware recalibration — a background re-tuner that adjusts parameters when the market regime shifts, but only when a backtest proves the change is better.

Where crucible discovery is about generating and refining theses, evolution is about what happens to the strategy candidates those theses spawn. The two run side by side under the scheduler.

Forven is a research tool. Nothing here is a prediction or a promise of returns, and none of it is financial advice. Every number below is illustrative of how the gates are wired, not a target you should expect to hit. Out-of-sample survival is the only signal worth trusting.

The strategy evolution pipeline

Evolution drives the same lifecycle described in the pipeline, but autonomously. Each testing cycle, run_testing_step() advances eligible strategies one stage at a time:

quick_screen — the first filter. The brain evaluates quick_screen_gate_passes(). Terminal failures (duplicate detection, overfit reject, too few trades with an undefined Sharpe) auto-archive the strategy with no retry. Non-terminal holds — typically data-quality issues — are re-queued.
gauntlet — _advance_gauntlet_readiness() runs the five readiness steps (timeframe sweep, optimization, apply best params, confirmation backtest, validation suite). See the gauntlet for what each robustness test measures.
paper — on passing the gauntlet gate, the strategy is promoted to paper trading: simulated execution against the live feed, no real capital.
live — after a minimum soak in paper, _advance_paper_live_readiness() checks the strict paper→live gate. Survivors graduate to live_graduated and trade real capital on HyperLiquid.
retirement — if a deployed strategy decays, evolution archives it and closes the learning loop.

The public site calls a paper-stage strategy a candidate. Internally it is the paper stage — we use the real stage names throughout the docs.

Promotion gates

A candidate only advances when it clears the gate for its stage. The thresholds are policy values evaluated by policy.evaluate_promotion(); the representative gauntlet→paper gate looks for:

fitness score ≥ 60 (the 5-factor fitness composite),
profit factor > 1.5,
max drawdown < 15%,
walk-forward degradation < 30% (out-of-sample performance must hold up).

These are illustrative of the gate shape, not a performance forecast. The full gate logic — including the lean paper gate versus the strict paper→live gate and operator overrides — lives on promotion gates, and the underlying numbers are defined in the configuration reference.

Pipeline throughput

How many candidates evolution processes per cycle is tunable, because a deep backlog can otherwise starve the scheduler:

Setting	Default	What it controls
`pipeline_assignments_per_cycle`	`10`	Max work items (backtests, candidate tasks) assigned per testing cycle (range 1–100).
`pipeline_drain_mode`	`true`	When true, runs all pending backtests under a time budget instead of a fixed batch.
`pipeline_drain_max_seconds`	`600`	Time budget for a drain cycle, in seconds (roughly 45s per strategy, run in parallel).
`adaptive_pipeline_throughput_enabled`	`false`	Scales assignments to clear the backlog within a target horizon.
`pipeline_target_clear_hours`	`6`	The horizon adaptive mode aims to clear the backlog within (range 1–168).

In drain mode, a cycle processes every queued backtest within the time budget and defers whatever does not finish to the next cycle. Legacy mode (pipeline_drain_mode=false) returns after a single fixed batch. Adaptive mode sizes each cycle to clear the backlog by pipeline_target_clear_hours.

Retirement and the learning loop

When a strategy reaches a terminal state — archived, retired, or live_graduated — record_outcome() looks up the quant skills cited in its task chain and writes a skill_outcome_events row. The skill's confidence is nudged: roughly +3% on a positive outcome, −5% on a negative one. So a strategy that decays out of live does not just disappear — it slightly lowers Forven's confidence in the patterns that produced it. That closure is idempotent on (skill_name, strategy_id, triggered_by), so a re-run never double-counts.

Terminal quick-screen failures are deliberately not re-queued. Missing that distinction is what produces "zombie" strategies that churn the pipeline forever. If you see a strategy keep reappearing, check its event log for a non-terminal hold (a data-quality gate) rather than a terminal one.

The diversity guard

Left alone, an ideation loop will happily mint the same winning family — usually RSI momentum — until your portfolio is a monoculture. The diversity guard prevents that echo chamber.

It inspects the most recent 80 strategies and measures how much of that window each family occupies. Two thresholds apply:

Soft threshold — strategy_diversity_threshold (default 0.35). When a family exceeds 35% of the recent window, the guard injects a cool-down note into the ideation prompt.
Hard threshold — hard_saturation_threshold (default 0.55). At 55% the family is flagged with higher severity.

When the soft threshold trips, the ideation prompt receives a STRATEGY DIVERSITY GUARD section telling the agent to steer away — for example:

RSI is cooled down. Prefer non-RSI families: funding, breakout,
volume, cross-asset, volatility, VWAP.

This does not delete or block existing strategies; it only biases what the research daemon proposes next, steering ideation toward under-represented families. You can tune the trigger point in settings:

{
  "strategy_diversity_threshold": 0.35,
  "hard_saturation_threshold": 0.55
}

A related guardrail, archetype fingerprinting, runs at candidate-creation time. It classifies each strategy by strategy_type, regime_class, indicator_family, and risk_profile, then rejects near-duplicates — an exact fingerprint match is always treated as a duplicate, and same-type strategies are flagged at roughly 85%+ fingerprint similarity with parameter similarity above 80%. The diversity guard governs the mix of families; fingerprinting blocks redundant copies within a family.

Regime-aware recalibration

A strategy that was tuned in a trending market can quietly degrade when the market goes range-bound. Recalibration is the background job that notices the shift and re-tunes — carefully.

check_and_recalibrate() runs on the maintenance scheduler for each tracked asset (BTC, ETH, SOL, and others). For each one it:

Runs detect_regime(asset) to read current conditions (volatility, trend, funding basis) and returns the current regime plus a confidence score (0–1). The regime classes are the standard four — TREND_UP, TREND_DOWN, RANGE_BOUND, HIGH_VOL — documented under market regimes.
Compares the current regime to the stored last_regime:{asset} value. If unchanged, it skips the asset — recalibration only acts on a genuine shift.
On a shift, for each affected strategy it checks is_strategy_allowed(strategy_type, regime, confidence) and skips any family disallowed in the new regime.
Computes a regime-aware parameter overlay via get_adjusted_params(...).
Runs two backtests — baseline params versus adjusted params, both over the same window — and only persists the adjusted params if their Sharpe is higher.
Logs the result to the activity feed, e.g. S012-ETH: sharpe 0.45 → 0.62 (illustrative), and updates last_regime:{asset}.

The two properties that make this safe to leave running:

Non-destructive. A parameter change is never persisted unless a fresh backtest proves it beats the baseline. There is no blind "the regime changed, so guess new numbers" path.
It respects your locks. Recalibration is a background writer, not an operator. If a strategy is in paper or live with param_write_blocked set, the recalibrator computes and logs the overlay but refuses to persist it. Operator-owned parameters stay operator-owned.

The recalibrator never blocks live trading. Transient failures (a DB hiccup, a flaky regime read) are logged as warnings and skipped — it is best-effort by design, so a recalibration error can never stall an open position. If you expected a re-tune that did not happen, check the activity feed for a skipped or no-change entry for that asset.

What you'll see

Evolution runs in the background, so its effects surface across several existing pages rather than one dedicated screen:

Strategy detail panel — stage transitions, gauntlet-readiness step status (pending / running / passed / failed), and updated_at refreshes when recalibration persists new params.
Strategy event log — quick-screen results, promotion gate outcomes, and retirement reasons (unfit, exceeded_drawdown).
Activity feed — testing-cycle summaries (planned/assigned counts), diversity-guard cool-downs in ideation task descriptions, and recalibration results per asset.
Quant Skills panel — the confidence movements that retirement and outcome closure produce, with version history and diffs.

Caveats (beta)

Drain mode loops until a strategy is ready or an error occurs; a no-progress guard breaks the loop and returns a no_progress status if a step runs but readiness does not advance. If a strategy is stuck, that status is your clue.
The crucible planner and the hypothesis promotion loop both dispatch candidate-development work to the same agent pool. Dedup keys this off any open candidate-family action per crucible — when it slips, you get duplicate develop tasks. See troubleshooting if you spot churn.
Regime detection and recalibration are best-effort. A skipped recalibration is normal, not an error.

This is research infrastructure for stress-testing ideas, not a money machine. Results from any stage — backtest, paper, or live — describe the past and are not predictive. Nothing here is financial advice.

The pipeline — the full strategy lifecycle this engine drives.
The gauntlet — the robustness battery a candidate must survive.
Crucible discovery — where the candidates evolution promotes come from.
Market regimes — the regime classes that trigger recalibration.

Evolution & recalibration

On this page