Models & providers (BYOK)

Configure bring-your-own-key LLM routing in Forven — providers, primary and fallback chains, per-provider auth, and auxiliary task routing.

Forven's AI layer runs on your own model keys. There is no Forven-hosted inference and no token reselling: every LLM call made by the agents — strategy authoring, post-mortems, the Deepdive chat — is routed through a provider priority list that you control, using credentials stored in your local keystore.

This page is for operators on the Forged tier (beta unlocks everything). It covers which providers are supported, how the primary-plus-fallback chain works, where keys live, and how to split auxiliary tasks onto cheaper models.

Forven is a research tool. Model output and any cost figures shown here are illustrative, not predictive, and nothing here is financial advice. No model can move real capital — agents only emit signals that pass the gauntlet and an operator approval gate.

The routing model

Every inference request — wherever it originates — passes through a single routing layer (call_ai). The layer does three things in order:

  1. Normalize the requested provider and model to canonical form. A bare model name like gpt-5.2 resolves to (openai, gpt-5.2).
  2. Build a fallback chain for that provider — an ordered list of (provider, model) pairs — and filter it down to providers you actually have credentials for.
  3. Try each pair in order, backing off on transient errors and rate limits, until one succeeds or the chain is exhausted.

Because routing is centralized, you configure it once and it applies to every agent and feature. Per-agent overrides (set on the Agents hub) sit on top of the default policy.

Supported providers

Forven recognizes these providers. You only need keys for the ones you intend to use; the rest are skipped at routing time.

Provider keyNotes
zaiDefault primary in the shipped policy (e.g. zai:glm-5.1)
openaie.g. openai:gpt-5.2
minimax
lmstudioLocal inference endpoint — no cloud key needed
openrouterAggregator; useful for reaching many models on one key
anthropic
deepseek

A model is always addressed as provider:model_id (for example openai:gpt-5.2 or zai:glm-5.1). Each provider has a default model so you can point at a provider alone and let routing pick the model.

Primary and fallback chains

A fallback chain is the ordered list of (provider, model) pairs the router tries for a given request. The shipped default priority is:

zai → openai → minimax → lmstudio

The first entry is your primary; the rest are fallbacks tried in sequence. At call time the router:

  • filters the chain to credentialed providers only — a provider with no auth profile is logged at debug level (expected noise) and skipped;
  • appends any configured-but-unlisted providers to the tail, so a working key is never wasted;
  • moves down the chain on each failure rather than giving up after one attempt.

This means a transient outage at your primary provider does not stall research — the next provider in the chain takes the call.

Bring your own key (auth profiles)

Each provider has its own auth profile holding the credentials for that provider. You supply them; Forven stores them in the local keystore on your machine. Keys never touch Forven's servers and never leave your device.

Set keys from Settings → Models (see Settings for the full hub) or via environment variables / the encrypted settings store. A provider with no profile is simply not part of the active chain — there is no error, the router just routes around it.

Steps — configure providers and routing

  1. Open the desktop app and go to Settings → Models (the Model Routing section).
  2. Add an auth profile for each provider you want to use — paste the API key for openai, zai, anthropic, and so on. For lmstudio, point at your local endpoint instead of a cloud key.
  3. Set the primary provider and, optionally, reorder the fallback chain.
  4. Choose a default model per provider (for example openai:gpt-5.2), or accept the provider default.
  5. Save. Changes apply to every subsequent LLM call.
  6. (Optional) Override the model for an individual agent on the Agents hub — those overrides layer on top of this policy.

What you'll see: the active routing policy in Settings → Models, and a read-only mirror under Diagnostics → Model Config showing the resolved primary, fallback order, and which providers are currently credentialed.

Auxiliary task routing

Not every call deserves your best model. Forven lets auxiliary tasks route differently from primary reasoning, so you can spend deliberately:

  • Compression (summarizing long context) can route to a cheap, fast model.
  • Recall and other background lookups can use the same economical tier.
  • Post-mortems — the failure analysis the brain queues when a strategy is archived — can route to a stronger model where the extra reasoning is worth it.

The result is a split policy: an inexpensive model for high-volume housekeeping, a capable model for the analysis that actually shapes decisions. Configure these alongside the primary policy in Settings → Models.

How errors and rate limits trigger fallback

The router distinguishes transient failures from terminal ones.

  • Transient — HTTP 429 (rate limit), timeouts, 503. The router backs off (honoring a Retry-After header when present, otherwise exponential backoff) and retries, advancing to the next chain entry if needed.
  • No credentials — logged at debug and skipped immediately; the next provider is tried.
  • Terminal — a non-transient error, or every entry in the chain exhausted. The router raises and the caller handles the final error (commonly "no credentials configured").

On success the call returns the generated text plus usage counts (input_tokens, output_tokens). Those counts feed cost estimation — see below.

Cost is tracked per call

Every successful call estimates its USD cost from the provider/model token rates and persists it on the originating record (an agent task's cost_usd, or a Deepdive message). That per-call accounting is what the spend caps act on:

  • Daily capagent_daily_cost_cap_usd, opt-in, default 0 (disabled). When set above zero, the day's summed cost_usd gates new agent task launches.
  • Deepdive per-thread capdeepdive.cost_cap_usd, default $5 per thread.

Routing and cost are two halves of the same system; the full ceiling mechanics live in cost controls.

Caveats (beta)

  • The shipped default chain (zai → openai → minimax → lmstudio) assumes you hold keys for those providers. With no credentials anywhere, calls fail closed rather than silently doing nothing — add at least one auth profile before running agents.
  • Per-call cost is estimated from hard-coded token rates, not billed back from the provider. Treat the figures as a guide and rely on the hard caps as the real control.
  • lmstudio requires a reachable local endpoint; if it is down it is treated like any other unavailable provider and skipped.

  • Agents — who makes the LLM calls and how per-agent model overrides work
  • Cost controls — daily and per-thread spend caps
  • Settings — the Models section where routing is configured