Quant skills (the learning loop)

How Forven turns backtest results into versioned, confidence-scored quant skills, closes the loop on real outcomes, and feeds the lessons back into ideation.

Most retail tools forget every backtest the moment it finishes. Forven does the opposite: it reads each result, extracts what it can defend, and stores it as a quant skill — a versioned, confidence-scored trading insight. Those skills then steer the next round of strategy generation, so the lab gets a little less naive every cycle.

This page is for anyone who wants to understand the learning loop: how a skill is born from a backtest, how its confidence moves, how real strategy outcomes close the loop, and how the store feeds back into ideation. It covers the mechanics behind the Memory Bank — the page where you browse and tidy this knowledge.

Forven is a research tool. Quant skills describe patterns observed in historical and simulated data. None of it predicts the future, confidence scores are not forecasts, and nothing here is financial advice.

The loop in one picture

The learning loop runs in the background, end to end:

  1. A backtest completes.
  2. Forven decides whether the result is worth analyzing (enough trades to mean something).
  3. An extraction step reads the metrics and emits either a new candidate hypothesis or an update to an existing skill.
  4. A candidate hypothesis that accumulates enough corroborating backtests promotes to a full quant skill.
  5. The skill is read by ideation, steering which strategies get proposed next.
  6. When a strategy that cited a skill reaches a terminal state, its outcome adjusts that skill's confidence — up or down.
  7. Periodic consolidation archives the dead weight and prunes stale candidates.

Each part is described below.

What a quant skill is

A quant skill is a validated, versioned trading insight, stored as a SKILL.md file under your FORVEN_HOME (the quant-skills/ directory inside ~/.forven). Each skill carries:

  • description — what the pattern is, in plain language.
  • skill_type — one of regime, failure, indicator, combo, params.
  • confidence — a score from 0 to 1 (see confidence below).
  • evidence — the array of backtests that support it.
  • what_works / what_doesnt_work — the actionable do/don't lists ideation reads.
  • version history — every change, with a diff and a change summary.

Skills are the durable layer of Forven's knowledge. They are deliberately conservative: a pattern does not become a skill on the strength of a single lucky backtest.

Candidate hypotheses: the provisional layer

Before a skill exists, there is a candidate hypothesis — an unconfirmed observation extracted from a backtest. It holds a pattern, an observation, and the list of backtest_ids that support it. A candidate hypothesis is domain knowledge on probation: a hint, not a conclusion.

The word hypothesis is overloaded in Forven. The candidate hypotheses here are quant-skill observations on their way to becoming skills. They are not the research-record hypotheses (crucibles) in the Hypotheses Manager, even though both use the word. Context disambiguates; this page is about the learning store.

How a skill is born

Extraction runs after a backtest, off the critical path so it never blocks the result:

  1. Worth analyzing? When a backtest completes, Forven checks whether the result is worth reading — it requires at least 10 trades before extracting anything. A two-trade fluke is noise, not a lesson.
  2. Extract the insight. A structured extraction step reads the backtest metrics and decides on one of three actions:
    • update_skill — the pattern matches an existing skill. Forven appends the new backtest to the evidence, recomputes confidence, bumps the version, and rewrites SKILL.md with a new history row.
    • new_hypothesis — the pattern is novel. Forven stores a new candidate hypothesis with its pattern, observation, and backtest_id. Subsequent similar backtests increment its count.
    • skip — an unremarkable result, ignored.
  3. Run in the background. Extraction is submitted to a background thread, so backtest completion is never delayed by it.

Promotion: candidate → skill

A candidate hypothesis promotes to a full quant skill once it accumulates three supporting backtests (PROMOTION_THRESHOLD = 3). At that point Forven:

  • deletes the candidate hypothesis file,
  • writes a new SKILL.md and evidence.json,
  • records a quant_skills_history row.

Promotion is automatic when the count reaches the threshold. An operator can also call force_promote_hypothesis() to promote a candidate immediately, regardless of count — useful when you already trust an observation and want it influencing ideation now.

A candidate hypothesis is provisional by design. Three backtests is a floor, not a guarantee of truth — it is the minimum evidence before an observation is allowed to steer future work.

Confidence: how trust moves

A skill's confidence is a single 01 score for how reliable it is. It is not set once and frozen; it moves as evidence arrives and as real strategies succeed or fail.

Two forces drive it:

  • Evidence recency. Confidence is weighted toward recent evidence. A supporting result's weight decays from 1.0 down toward 0.3 over roughly 90 days, so a skill that earned its standing two years ago does not coast on stale wins. Consistency matters too: evidence where the Sharpe held above 0.5 counts as positive corroboration.
  • Real outcomes. When a strategy that cited a skill reaches a terminal state, the result nudges the skill's confidence: roughly +3% on a success, −5% on a failure, and 0 on a neutral outcome. Failures cost more than successes pay — the asymmetry is deliberate, because a pattern that breaks in practice should lose standing faster than it gained it.

Higher confidence weights a skill more heavily in ideation prompts. Lower confidence eventually sends it to consolidation.

Closing the loop: skill outcomes

The most important part of the loop is the part most tools skip: checking whether a remembered insight actually helped.

When a strategy reaches a terminal state — archived, retired, or live_graduated — Forven runs outcome closure:

  1. It looks up the cited skills in that strategy's task chain (which skills shaped its ideation).
  2. It writes a skill_outcome_events row recording the outcome and the confidence delta.
  3. It rewrites the skill's SKILL.md with a new version and a history row.

Outcome closure is idempotent on the (skill_name, strategy_id, triggered_by) tuple: re-running it for the same outcome is a no-op (INSERT OR IGNORE), so a skill's confidence is adjusted exactly once per strategy per trigger. There is no double-counting.

This is the discipline that keeps the store honest. A pattern that looked brilliant in one backtest but kept producing strategies that died in the gauntlet or faded in paper will quietly lose confidence, no matter how good it once looked.

Confidence movement is illustrative of a skill's track record inside Forven's own tests — not a forecast. A high-confidence skill is one that has held up so far, not a promise it will keep working.

Feeding back into ideation

Skills are not a passive archive. They are read every time Forven invents a new strategy.

When the brain builds a strategy-ideation prompt, it calls get_ideation_context(regime) to pull the top five skills by confidence — filtered to the current market regime where one applies, or general otherwise — and folds their what_works / what_doesnt_work lists into the prompt. The agent then ideates with that context, biased toward patterns that have survived and away from ones that have failed.

This is what makes the loop a loop: every backtest can teach a skill, and every skill can shape the next backtest's parent strategy.

Diversity, not just confidence

Confidence alone could push the lab into an echo chamber — endlessly re-proposing whatever family is currently winning. A separate strategy diversity guard counters that, watching recent strategies for family saturation and nudging ideation toward under-explored families. It is a sibling mechanism to the learning loop rather than part of it; see crucible discovery for how discovery and ideation are orchestrated.

Consolidation and pruning

Left alone, any learning store sprawls into near-duplicates and dead entries. A periodic consolidation job keeps it tight:

  • Archive low-value skills. A skill that has stayed below 0.3 confidence across 20 or more samples is archived — it had its chance and did not earn its keep.
  • Prune stale candidates. Candidate hypotheses that never reached the promotion threshold are pruned after about 90 days (CONFIDENCE_DECAY_DAYS).

You trigger and watch consolidation from the Memory Bank. Treat it as permanent — merges and prunes are not reversible.

The store on disk and over HTTP

Skills live as files under FORVEN_HOME, and the same data is exposed through the local API for the UI. The store is plain enough to inspect directly.

# Where the learning store lives (defaults to ~/.forven)
$env:FORVEN_HOME
# Skills are SKILL.md files under quant-skills/, candidate hypotheses under _hypotheses/,
# and archived skills under _archived/

The local HTTP API surfaces the store in tiers, so you can list cheaply and drill in only when needed:

Method & pathWhat it returns
GET /quant-skillsSummary list: {name, type, confidence, samples, version} per skill (no body).
GET /quant-skills/{name}Full skill detail. Add ?section=what_works|what_doesnt_work|evidence|metadata|history for one slice.
GET /quant-skills/{name}/historyVersion history: {version, parent_version, body_diff, change_summary, evidence_task_id, created_at}, newest first.
GET /hypothesesPending candidate hypotheses: {id, pattern, observation, count, created_at}.
POST /hypotheses/{id}/promotePromote a candidate to a skill if its count is at least 3.
GET /skill-outcomesOutcome events: {skill_name, outcome, confidence_delta, before, after, triggered_by}. Filter with ?skill_name=, ?strategy_id=, ?limit=, ?offset=.

These are the local endpoints the Memory Bank UI reads; you do not normally call them by hand, but they make the store auditable.

Reading a skill

In the UI, open the Memory Bank and select a skill to inspect the loop's output.

Steps

  1. Open the Memory Bank at /memory.
  2. Browse or search the skill list. Each row shows name, type, confidence, and samples.
  3. Open a skill to read its detail: the what_works / what_doesnt_work lists, the evidence array, and the version history with per-version diffs.
  4. Open the skill's outcomes view to see the skill_outcome_events — each row's outcome, the confidence delta, and the before/after values.
  5. To promote a candidate observation early, find it under the candidate / pending hypotheses view and promote it (or call POST /hypotheses/{id}/promote).

What you'll see

A list of skills sortable by confidence, and per skill a detail panel showing its do/don't lists, its evidence, a version history with diff tracking, and a confidence movement timeline tied to real strategy outcomes. A candidate hypothesis shows its backtest count climbing toward the promotion threshold of three.

Caveats

  • Skills can be wrong. A Signal-tier observation is a hint, not a conclusion. Trust rises only as out-of-sample evidence and real outcomes accumulate.
  • Confidence describes the past. It reflects how a pattern has performed in Forven's backtests and simulations; it does not predict performance.
  • Promotion is a floor, not proof. Three backtests is the minimum to promote a candidate, not a certificate of truth.
  • Consolidation is permanent. Archiving and pruning are not reversible — review before you run them.
  • This is beta software. The thresholds and deltas above are the current defaults and can change.

Forven is a research tool. The quant-skills loop describes a research process, not predicted performance. Past survival of a strategy is not predictive of future results, and nothing here is financial advice.