Hypothesis-driven research

How Forven organizes research around market hypotheses that spawn strategies, judges them by hit-rate and diversity, and graduates only the proven.

Forven does not chase strategies one at a time. It organizes research around hypotheses — concrete market theses that each spawn a family of child strategies. A hypothesis is the unit of belief; the strategies under it are the experiments that confirm or kill that belief.

This page is for anyone using the Hypotheses Manager to track research, and for operators who want to understand how a verdict is reached and what graduation actually does.

What a hypothesis is

A hypothesis (called a crucible inside the discovery engine) is a market thesis with structure: a market_thesis, a stated mechanism, a set of target_assets and target_timeframes, and a claimed edge. It is not a strategy. It is the question a batch of strategies is built to answer.

Each hypothesis carries an origin:

agent — invented by the research daemon (an LLM-proposed thesis).
harvested — distilled from an external source (a YouTube talk, a forum post, a blog).
operator — seeded by you, manually or by URL ingest.

Child strategies link back to their hypothesis by hypothesis_id. As each child moves through the pipeline — researching → backtesting → quick_screen → gauntlet → paper → live — the hypothesis accumulates evidence about whether its thesis holds.

Hypotheses and quant-skill "hypotheses" are two different things. The research-record hypotheses described here live in the Hypotheses Manager. The quant-skills learning loop uses the same word for an unconfirmed observation that promotes to a skill after three backtests. Context disambiguates; this page is about the research records.

The hypothesis pool

Active hypotheses form a pool. The pool has a cap. When a new hypothesis is created and the pool is already full, Forven auto-evicts the weakest active hypothesis (fewest live children, stalest activity) to archived with reason pool_pressure_eviction. The point is to keep research focused rather than letting an unbounded backlog of half-explored ideas pile up.

A scheduler job, the hypothesis-promotion loop, picks the top-K most promising hypotheses each cycle — scored by positive children, recency, and diversity — and dispatches a develop_candidate research task for each. That task creates new child strategies under the hypothesis. This is how an active thesis gets the experiments it needs to be judged.

How a verdict is reached

A hypothesis is judged by a math floor first, LLM second. The floor is deterministic and binding; the LLM auditor can only make the verdict stricter, never looser.

The signals

Every eligible hypothesis is scored on three signals over a rolling window of its most recent children:

hit_rate — the fraction of recent children that reached paper / live_graduated (or hold a paper_eligible / deploy_eligible verdict).
diversity_cells — the number of distinct (asset, timeframe) pairs its children cover. A thesis that only works on one asset at one timeframe is suspect.
dead_children — how many children have been archived or rejected.

The math floor

proven        hit_rate >= threshold  AND  diversity_cells >= min_cells
disproven     all children dead  OR  (hit_rate < threshold * 0.25  AND  window full)
researching   otherwise (still gathering evidence)

The threshold * 0.25 term means a hypothesis is only declared disproven once its pass rate falls below a quarter of the proven bar and the rolling window is full — never on a thin sample.

The LLM auditor

When the floor is computed, Forven calls an LLM auditor with the signals, the child metrics, and the prior verdict memo. The auditor returns a verdict, a rationale, and a claim_verdict (did the children actually confirm the claimed edge, or did they pass for some unrelated reason?).

The math floor binds the auditor:

It cannot upgrade a disproven floor to proven.
It can downgrade a proven floor — for example, if the winning children are all correlated and the apparent diversity is an illusion.

If the LLM is unavailable (provider outage, rate limit), the verdict falls back to the math floor deterministically. The memo logs llm_unavailable=true and the pipeline keeps moving — no manual unblock required.

The verdict loop runs on roughly five-minute ticks and only re-evaluates hypotheses whose evidence has actually changed (a child changed stage, the memo went stale past seven days, or a child updated after the last memo).

Graduation and canonical strategies

When a hypothesis is proven, graduate_hypothesis() runs:

manager_state is set to graduated, status to proven, and protection_status to protected.
The best child in each (asset, timeframe) cell — highest Sharpe in its latest backtest — is flagged canonical.
A next_revisit_at is scheduled so the thesis is re-opened for fresh variants later.

The canonical strategy is the frozen edge of a proven thesis: the single best expression of the idea per asset/timeframe cell. Graduation is idempotent — re-graduating a hypothesis refreshes its revisit time but does not re-flag canonicals.

Canonical protection

Canonical children are protected from loss. Automated actors cannot archive or reject a canonical=1 strategy. Only two things can clear that flag:

An operator with an explicit force=true override.
The decay kill-switch (actor=decay_tracker), which is allowed to retire a degraded live strategy even if it is canonical — because trapping a decaying strategy in protection would be worse than letting it go.

Canonical auto-deploy (opt-in)

If canonical_auto_deploy_enabled is on, graduation queues paper-promotion workflows for newly canonical children that are sitting in the gauntlet stage. This never bypasses gates — it queues the work; the gauntlet and capital gates still apply in full. The setting is off by default and requires an explicit operator opt-in.

When a hypothesis is disproven, archive_hypothesis() sets it to archived / disproven with reason disproven_verdict, freeing a pool slot.

Using the Hypotheses Manager

The Hypotheses Manager lives at the /hypotheses route. It is your research inventory.

Steps

Open /hypotheses. The view is tabbed: active, archived, trash, and graduated, each with a count.
To add a thesis yourself, click Create for a manual entry, or use URL ingest to seed a hypothesis from a source link. New manual hypotheses enter as proposed.
To let Forven invent ideas, use the Discover panel to run crucible discovery — the autonomous harvester that proposes new hypotheses from external sources.
Use search, the lane / status / quality filters, and sort to find a hypothesis. Open one to see its detail record.
On the detail view (/hypotheses/[id]), read the verdict memo (the auditor's rationale), the quality metrics (hit-rate, diversity, child metrics), the research timeline, and the data gaps list (assumptions that still need confirmation).
Act on it: Force revisit to re-open a graduated thesis for new variants, retrigger research to dispatch more child strategies, or bulk archive / restore / trash from the manager.

What you'll see

The active tab shows hypotheses still gathering evidence, each with its current status and a count of children moving through the pipeline. The graduated tab holds proven theses with their canonical strategies. A disproven thesis drops into archived with its verdict memo intact, so the reasoning behind every kill is auditable later. Data gaps flagged on the detail page tell you which fields were inferred from a source rather than stated — useful context before you trust a harvested idea.

How this connects to discovery

Hypotheses are the front end of Forven's autonomous research loop. The crucible discovery engine and the planner invent and refine theses; the hypothesis-promotion loop funds the promising ones with child strategies; the gauntlet tests each child; and the verdict loop closes the books with proven, disproven, or still-researching. Lessons from every backtest also feed the quant-skills store, which steers the next generation of ideas away from saturated, played-out families.

Caveats

The verdict math is deliberately conservative: it would rather sit at researching than declare a thin sample proven. A hypothesis with very few children will not graduate, by design.
Graduation flags canonicals but does not put capital at risk. Children still pass the paper and paper→live gates before anything trades.
This is beta software. Verdicts are LLM-assisted but math-bounded; if the auditor is unreachable, the deterministic floor still applies.

Forven is a research tool. The verdicts, scores, and any illustrative numbers here describe a research process, not predicted performance. Past survival of the gauntlet is not predictive of future results, and nothing here is financial advice.

The pipeline — the full lifecycle a child strategy travels.
The gauntlet — the robustness battery each child must survive.
Crucible discovery — how new hypotheses are invented and refined.
Quant skills — the learning loop that feeds back into ideation.

Hypothesis-driven research

On this page