Crucible discovery

How Forven harvests and invents new strategy ideas as crucibles, turns source material into testable specs, and moves them toward the gauntlet.

A crucible is a research hypothesis: an untested thesis about a trading approach — its indicators, regime, timeframe, target assets, and claimed edge. Crucible discovery is how Forven fills its research pool with these theses, either by harvesting ideas from external sources or by inventing them, then refining each into something concrete enough to test.

This page is for operators who run the research loop. It covers discovery settings, how source material becomes a testable spec, the crucible lifecycle (proposed → testing → viable → expanded), the planner that drives the work, and how you review the results in the Hypotheses Manager.

Discovery is upstream of everything else. A crucible that survives refinement spawns child strategies, which then run the gauntlet and — if they survive — graduate through the pipeline. Discovery is one half of the autonomous research daemon; this page is about where ideas come from, not how the daemon schedules them.

Forven is a research tool. Crucible discovery generates hypotheses to test, not trade recommendations. Nothing a discovered crucible claims is validated until it survives the gauntlet, and survival in backtests is not predictive of future results. Nothing here is financial advice.

Where ideas come from

Every crucible carries an origin that records how it entered the pool:

  • agent — invented by an LLM during ideation.
  • harvested — extracted from an external source (YouTube, Reddit, a forum, a blog, a podcast, or GitHub).
  • operator — seeded by hand.

Harvesting is the part most people mean by "discovery." A discovery task dispatches to the strategy-developer agent with a benchmarking research contract. The agent uses discover_* tools (discover_youtube, discover_reddit, discover_forum, discover_blog, discover_podcast, discover_github) to find candidate ideas and inspect_* tools to read the underlying artifact — a transcript, a thread, a post. For each thesis it finds, it calls create_hypothesis with origin_type=harvested, recording the target assets, target timeframes, mechanism, and claimed edge.

These tools are available only to the discovery agent, and only when it holds the benchmarking research contract. You do not call them directly.

Discovery settings

Discovery is off by default. You enable and shape it through the autonomous_discovery config group (see the configuration reference for where config lives and how precedence works):

KeyDefaultMeaning
autonomous_discovery.enabledfalseMaster switch. When false, discovery only runs if you force it.
autonomous_discovery.modeoperator_approvesoperator_approves parks discovered crucibles as proposed for your review; autonomous lets them flow through the pipeline on their own.
autonomous_discovery.max_open_discovery_tasks1Cap on concurrent discovery tasks. Prevents duplicate harvesting.
disproven_dedup_lookback_days30When building the "already-known" digest for dedup, include crucibles disproven within the last N days.

When discovery runs, run_crucible_discovery() first checks whether a discovery task is already open; if one is, it defers rather than stacking work. The dispatched task includes a dedup digest — the titles of active and recently-disproven crucibles — so the agent is told what already exists.

The dedup digest is best-effort. Under load an LLM may ignore it, and running more than one discovery task concurrently can produce duplicate harvests. Keep max_open_discovery_tasks at 1 unless you have a reason not to.

From source material to a testable spec

A podcast clip or forum post rarely states a complete strategy. Forven reconstructs the missing pieces. When the discovery agent extracts a thesis, an LLM fills in the fields a strategy needs — indicators, entry, exit, timeframe, instruments, parameters, and regime — and tags each field as one of:

  • stated — the source actually said it.
  • inferred — Forven reconstructed it.

Each field also carries a confidence between 0 and 1. The output is a structured JSON spec plus an assumptions list, embedded in the crucible's artifact_text.

Inferred fields with low confidence (below 0.5) are recorded as data gaps against the crucible, so the test loop knows exactly what was guessed rather than observed. You can see pending gaps in the hypothesis detail panel as pending_data_gaps. The testing loop is expected to validate inferred assumptions before anyone trusts the thesis.

This matters for honest research: a harvested idea that looks complete may be three-quarters inference. The tagging keeps that visible instead of laundering a guess into a fact.

The crucible lifecycle

Operators see four stages, plus a failure state:

proposed → testing → viable → expanded
                  ↘ failed

The stages you see are derived by derive_crucible_status() from the underlying hypothesis status (proposed / researching / proven / disproven):

  • proposed — a fresh thesis awaiting refinement (or your approval, in operator_approves mode).
  • testing — refined and researching; child candidates are being developed and backtested.
  • viableproven and protected: the thesis produced a promoted descendant, so it is locked against casual removal.
  • expanded — a viable crucible that has 3 or more strategy candidates or any promoted descendant (paper, live, or deployed). Forven keeps growing the family.
  • failed — refinement or development hit its retry cap and the crucible was archived.

Protection status

A viable crucible also carries a protection status that governs whether it can be dethroned (replaced by a better thesis):

  • unprotected — can be archived freely.
  • protected — an operator-approved, proven thesis, locked from dethrone.
  • contested — a dethrone approval is pending your decision.

Dethroning a protected crucible is gated by an approval. Requesting one flips the crucible to contested; your decision either clears protection and moves the crucible, or restores it. The relevant routes are:

POST /crucible/{crucible_id}/viable           # mark viable (proven + protected)
POST /crucible/{crucible_id}/dethrone/request # open a dethrone approval (→ contested)
POST /crucible/{crucible_id}/dethrone/decide  # resolve it ({ "approved": true|false })

The crucible planner

The planner is the engine that turns a pool of crucibles into actual work. Each cycle it calls plan_next_actions(limit=3) to emit up to three work items, choosing from:

  • refine_crucible — detail and validate a proposed thesis (proposed → researching).
  • develop_candidate — create test strategies for a researching crucible.
  • run_backtest — backtest untested candidates.
  • expand_viable_crucible — grow the family of a proven + protected crucible.
  • propose_crucible — replenish the pool when it runs dry.

Refine work and develop/expand work dispatch to the strategy-developer agent; backtests go to the simulation-agent. Each cycle logs planned_count / assigned_count and the chosen actions to the activity feed.

The planner enforces a few disciplines so it doesn't run away:

  • Per-crucible spawn limits cap how many candidates one crucible can create.
  • A 3-strike retry cap per action kind: a refine or develop that fails three times auto-archives the crucible rather than retrying forever.
  • Dedup with the hypothesis promotion loop. Both the planner and the promotion loop can dispatch develop_candidate, so a shared check (candidate_action_open()) treats any open candidate-family action — develop_candidate or expand_viable_crucible — as "already in flight" for that crucible.
  • A refine budget (refine_in_flight_budget, default 2) reserves slots for refine_crucible so the proposed → researching funnel isn't starved by candidate development.

A refinement only "counts" if it made a durable change. _refine_task_has_durable_update() requires the task to have actually called update_hypothesis_fields or attach_hypothesis_artifact and for those calls to have succeeded. A task that reports completed without changing anything does not advance the crucible to researching — this avoids status-flag-only no-ops.

When the pool runs dry — every researching crucible either has strategies in flight or has exhausted its spawn limits — the planner emits propose_crucible to mint a fresh thesis and keep the pipeline fed.

The planner/promotion-loop dedup is load-bearing. A code audit found that without candidate_action_open() covering both candidate-family actions, the two loops can stack hundreds of duplicate develop tasks per week onto the same crucible.

Steps: enable discovery and review the results

  1. Open the configuration reference to confirm where your config lives, then set autonomous_discovery.enabled=true.

  2. Choose a disposition: keep autonomous_discovery.mode=operator_approves to review every harvested crucible yourself, or set it to autonomous to let them flow.

  3. Leave autonomous_discovery.max_open_discovery_tasks=1 (the safe default) and optionally tune disproven_dedup_lookback_days.

  4. Let the scheduler run discovery, or trigger it on demand. To bypass the enabled flag manually, force a run:

    # Force a discovery run regardless of the enabled flag (local API on 127.0.0.1:8003)
    Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:8003/crucible-discovery/run?force=true"
  5. Open the Hypotheses Manager (hypothesis-driven research) and look for new crucibles with origin=harvested in proposed status. Read each one's crucible_status, protection_status, verdict_memo, and pending_data_gaps.

  6. In operator_approves mode, approve a crucible to advance it to researching, or reject/archive it. The planner takes it from there.

  7. Optionally trigger a planner cycle by hand to push work along:

    Invoke-RestMethod -Method Post -Uri "http://127.0.0.1:8003/planner/cycle?limit=3"

What you'll see

In the Hypotheses Manager, discovered crucibles appear as proposed entries tagged origin=harvested. Each crucible detail panel shows the derived crucible_status, the protection_status, the verdict_memo, and any pending_data_gaps from inferred fields. Auto-promotions and planner actions (planned_count / assigned_count) show up in the activity feed. A planner-cycle call returns a small JSON summary:

{ "planned": 3, "assigned": 2, "assigned_task_ids": ["..."], "actions": ["refine_crucible", "develop_candidate"] }

What discovery does not do

  • It does not place trades. A harvested idea has zero authority until its child strategies survive the gauntlet and the promotion gates.
  • It does not validate its own inferences. Low-confidence inferred fields are flagged as data gaps for the test loop to confirm.
  • It is not the live scanner. Discovery invents and refines theses; the scanner executes a fixed set of already-certified strategies. Different subsystem, different page.