Deep-dive strategy chat
An interactive, tool-using chat scoped to one strategy — read and edit its code, set defaults, and re-run backtests, all under a per-thread cost cap.
Deep-dive is a focused chat for one strategy. You open a thread, talk through a problem — a weak gauntlet step, a parameter you suspect, a signal that fires too often — and the model works alongside you: it reads the strategy's code, edits it, sets default parameters, and re-runs a backtest, then streams the result back. It is the close-range complement to the broad, hands-off work of the agents layer.
Deep-dive lives on the strategy detail page, in the Deepdive tab (a chat sidebar). One active thread exists per strategy at a time. Like the rest of the agent layer, it never places trades — its tools touch code, parameters, and backtests only.
This is a Forged-tier feature and runs on your own LLM key. Spend is bounded by a per-thread cost cap, covered below.
What it is for
Use a deep-dive thread when a single strategy needs hands-on attention and you want the model's help without leaving the strategy in front of you:
- Debug a gauntlet failure. Ask why a strategy diverged out-of-sample, then have the model inspect the code and propose a fix.
- Tune parameters deliberately. Edit a default, re-run the backtest, and read the new metrics in the same conversation.
- Trace a signal. Read the signal-generation logic and reason about why it triggers where it does.
- Iterate with a record. Every message, tool call, parameter change, and cost is persisted, so you can trace how the strategy got to where it is.
Deep-dive is for refinement, not promotion. It does not advance a strategy through the pipeline — that still happens through the gauntlet and the operator gates. Think of it as the workbench where you sharpen a single tool before it earns its place in the battery.
The tools the model can use
Inside a deep-dive thread the model is given a small, strategy-scoped toolset. These are the only actions it can take:
| Tool | What it does |
|---|---|
deepdive_read_code | Reads the current source of the strategy under discussion |
deepdive_edit_code | Edits that strategy's code |
deepdive_set_default_params | Sets the strategy's default parameters |
deepdive_run_backtest | Runs a backtest and returns the metrics |
The toolset is deliberately narrow. There is no tool to place a trade, transition a stage, or touch another strategy. Changes the model makes to code and defaults are persisted against the strategy, so a deep-dive edit is a real edit — review it the way you would your own.
How a turn works
Each message you send drives one turn. The server appends your message, builds a system prompt scoped to the strategy (its code, recent results, and context), and invokes the model with the four tools above. The model can call tools and read their results, looping up to MAX_TOOL_ROUNDS = 8 tool rounds within a single turn before it must produce a final answer.
The response streams back over Server-Sent Events (SSE, not a WebSocket). You will see events in roughly this order:
user_persisted → your message is saved
assistant_token → the model's reply, streamed token by token
tool_call → the model invokes a tool (e.g. deepdive_run_backtest)
tool_result → the tool's output flows back into the conversation
done → the turn is completetool_call and tool_result pairs may repeat as the model reads code, edits it, and re-runs the backtest within the same turn. Everything persists to the thread, so closing and reopening the page does not lose the history.
Steps — start a deep-dive thread
- Open the strategy you want to work on and go to its Deepdive tab on the strategy detail page.
- The tab opens the active thread for that strategy, or creates one if none exists (one active thread per strategy).
- Type your first message — describe the problem plainly. For example: "The walk-forward step fails out-of-sample. Read the entry logic and tell me what might be overfit."
- Send it. The reply streams in; watch for
tool_callevents as the model reads the code. - Ask the model to make a change — for example, to widen a stop or adjust a default — and then to re-run the backtest.
- Read the returned metrics, trusting the out-of-sample figures over in-sample.
- When you are done, archive the thread to lock it read-only. The full transcript, edits, and cost remain for the record.
What you'll see: the chat sidebar streams the model's reply live, tool calls and their results appear inline, edited code and changed defaults are saved against the strategy, and a running cost figure is shown for the thread.
Cost caps
Deep-dive enforces its own ceiling, separate from the daily agent cap. Each turn's cost is estimated from your provider and model's token rates and added to the thread's running total.
- Per-thread cap —
deepdive.cost_cap_usd, default$5. When a thread's accumulated cost reaches the cap, the next turn is rejected with a cost-cap error (thesendendpoint returns an error status rather than streaming a reply). - This is distinct from the daily agent-task cap (
agent_daily_cost_cap_usd, default0= disabled), which governs background agent work, not deep-dive threads.
You can read or raise the per-thread cap from the deep-dive cost display or via the API. Caps are conservative by design — they exist to stop a runaway loop, not to ration normal use. The full spend model is in cost controls, and provider routing is in models & providers.
During beta, cost figures are estimates derived from token counts, not an exact billing ledger. Treat the cap as a hard stop and the displayed number as a guide.
The API (for developers)
Deep-dive is exposed over the local HTTP API. These endpoints back the UI:
POST /api/deepdive/threads create a thread { strategy_id }
POST /api/deepdive/threads/{thread_id}/send send a message { user_text } (SSE stream)
GET /api/deepdive/threads/{thread_id}/messages fetch the thread transcript
POST /api/deepdive/threads/{thread_id}/archive lock the thread read-only
GET /api/deepdive/cost-cap read the per-thread cap
PUT /api/deepdive/cost-cap update the cap { cap_usd }Notes for callers:
sendreturns a streaming SSE response, not JSON and not a WebSocket. Consume the event stream rather than awaiting a single body.- Thread creation is idempotent per strategy: the server returns the existing active thread for a
strategy_idif one is open (create_or_get_active_thread). - If a thread is over its cost cap,
sendreturns an error status instead of a stream — handle the cost-cap rejection before retrying. - Archived threads are read-only;
sendagainst them will not start a new turn.
The deep-dive routes do not require the operator key, but the API as a whole still sits behind FORVEN_API_KEY when authentication is configured — the server binds to 127.0.0.1 and stays on your machine. See the API reference for the full auth model.
Caveats (beta)
- One active thread per strategy. To start fresh, archive the current thread first.
- Code and parameter edits made in a thread are real and persisted — there is no separate sandbox to discard them. Review changes before relying on them.
- A turn caps at 8 tool rounds; a deeply iterative request may need several turns rather than one.
- Cost displays are estimates; the hard cap is the reliable control.
Forven is a research tool. Anything a deep-dive thread produces — code, parameters, or backtest numbers — is illustrative and not predictive, and nothing here is financial advice. No edit made in a deep-dive thread reaches real capital without surviving the gauntlet and the operator approval gate.
Related
- Agents — the broader AI co-researcher layer deep-dive complements
- Cost controls — per-thread and daily spend caps in detail
- The strategy lab — where strategies are built, optimized, and gauntleted
- Backtesting a strategy — the backtest the
deepdive_run_backtesttool drives
AI Drop Zone
Sessions (ADZ-####) that scope batch strategy uploads and backtest runs into one queryable namespace, so you can later ask "what did I test?"
The research daemon
The autonomous background loop that invents hypotheses, drives them through the gauntlet, retires losers, and surfaces a shortlist for review.