Health monitoring
How Forven's health monitor tracks component states, enforces data-stream SLAs, and routes amber and red alerts so you catch trouble early.
The health monitor is Forven's background watchdog. It runs as an async task on a roughly 30-second heartbeat and aggregates the state of the moving parts — the scheduler, the brain workers, the bot, and the data collector — into a single colour-coded picture: green, amber, or red. It also watches each data stream against its own staleness SLA, so a stalled OHLCV feed or a silent funding stream surfaces before it quietly corrupts a backtest or a live decision.
This page is for operators who want to read those states correctly and know when to act. You'll find the health surface inside the /ops dashboard, alongside the system controls and the scheduler.
What the health monitor watches
The monitor has two jobs: component liveness and data freshness.
Component states. It aggregates signals from the long-running subsystems:
- the scheduler (are jobs ticking, or are locks stale?)
- the brain workers (the orchestrator's processing loop)
- the bot (live trading / scanner loop)
- the data collector (ingestion pipeline)
- the lab
Each component resolves to one of three states:
| State | Meaning |
|---|---|
green | Healthy — running and within its expected cadence. |
amber | Overdue or degraded — a stream is past its SLA, or a component is lagging. |
red | Critical — a stream is badly stale (past roughly twice its SLA) or a component is down. |
Data-stream SLAs. Each market-data stream has its own freshness window. If the newest row for a stream is older than its SLA, the monitor raises an amber alert for that stream; if it crosses roughly twice the SLA, it escalates to red.
| Stream | Illustrative SLA |
|---|---|
ohlcv | 60 minutes |
oi (open interest) | 3 hours |
funding | 12 hours |
The SLA values above are illustrative defaults drawn from the current build. The exact windows are per-stream and may change between releases — treat them as orientation, not a contract.
The SLAs are per-stream and independent. There is no single rolled-up "database is healthy" verdict — each stream can go amber or red on its own. That is deliberate: a stale funding feed should not be masked by a perfectly fresh ohlcv feed.
Reading the states
When you open /ops and look at the health section, read it top-down:
- Component row first. If a component is amber or red, that is usually the root cause — a paused or hung scheduler will starve everything downstream.
- Then the streams. A red
ohlcvstream with green components usually means a data-source problem (rate limit, credentials, upstream outage), not an app fault. See Data sources for where each feed comes from. - Cross-check the scheduler. Stale data is often a stalled collector job. The health monitor and the scheduler tell complementary halves of the same story.
Health is about whether the machine is running honestly, not whether a strategy is working. A perfectly green system can still be running strategies that should be killed. Forven is a research tool: a healthy dashboard says nothing about future results.
How alerts route
The health monitor does not just colour cells — it emits alerts. When a component or stream crosses a threshold, the monitor calls the same emit_notification() path everything else uses, so health events flow through the standard notifications routing policy.
- Health events at warning severity or above are routed out (for example, to Discord) according to your notification preferences.
- Routing obeys the usual dedupe-by-key and cooldown rules, so a stream that flaps amber/green does not spam you.
- Critical health alerts are severity-aware: they bypass deduping against lower-severity rows with the same dedupe key, so a genuinely critical condition is never silently suppressed by an earlier benign one.
If a stream recovers, its state returns to green on the next heartbeat and subsequent alerts stop.
Turning a health alert into a fix
A health notification can be escalated into work. From the notification, you can hand it to a repair agent. Substitute the notification id for NOTIFICATION_ID:
# Create a repair task from a notification (operator action)
curl.exe -X POST http://127.0.0.1:8003/api/notifications/NOTIFICATION_ID/repair `
-H "Content-Type: application/json" `
-d '{\"agent_id\": \"full-stack-engineer\"}'That creates an agent task (type=notification_repair) carrying the event payload; the agent investigates and writes its findings back to the task output. You can also acknowledge an alert (POST /api/notifications/NOTIFICATION_ID/acknowledge) or re-route it through the current policy (POST /api/notifications/NOTIFICATION_ID/resend).
Steps: check system health
Day to day, you read health from the dashboard. To do it deliberately:
- Open the desktop app and go to /ops.
- Find the health section. Note the colour of each component (scheduler, brain workers, bot, data collector, lab).
- If any component is amber or red, open the scheduler section on the same page and look for jobs with a recent
last_erroror a stale lock — see Troubleshooting. - Check the data-stream rows. A stream past its SLA shows amber; one well past shows red.
- For a stale data stream, confirm the relevant collector job is enabled and running, and that its data source credentials/connectivity are intact.
- If you need an audit trail, check the notification center — every health alert that crossed the routing threshold is logged there with
event_type,severity,source, andsummary.
What you'll see: the /ops health section renders each component and stream as a green / amber / red indicator that refreshes on the monitor's heartbeat (about every 30 seconds). Amber and red items also appear in the in-app notification center, and — if routing is enabled — in Discord.
The daemon heartbeat and history
The health monitor leans on heartbeats written by the running loops. Those heartbeats are short-lived by design: the heartbeat_activity log is pruned to roughly 2 days by the maintenance job. That keeps the table small, but it means heartbeat history is not a long-term audit source — if you are reconstructing an incident from last week, the raw heartbeats will already be gone.
The same is true of the broader audit trail: notifications are retained about 60 days, and the activity_log about 90 days, before pruning. Keep your own backups if you need a longer record. See Database & maintenance for the full retention picture; maintenance windows are settings-driven via the forven:pipeline:settings key.
If you are troubleshooting something that happened more than a couple of days ago, the heartbeat rows that would explain the gap have likely been pruned. Capture a backup before they age out.
Caveats
A few honest rough edges to keep in mind:
- No global rollup. Stream SLAs are evaluated per stream. The monitor can fire amber on
fundingwhileohlcvis green; there is no single overall data-health number to glance at. - Startup catch-up can look alarming. After an app restart, the scheduler collapses any job that is more than a minute stale into a single immediate run, rather than replaying the whole missed queue. You may briefly see a burst of activity — that is the catch-up, not a fault.
- Stale locks are not always recoverable on sight. If a job appears hung, the scheduler will not force-recover its lock while a background task or worker thread is still alive — the lock is held until that thread exits. A persistently red scheduler often means a slow external call (an LLM or exchange request) is still running. See Troubleshooting.
- Health says nothing about correctness of trades. It tracks liveness and freshness only.
Forven is a research tool. A green health dashboard means the system is running and its data is fresh — nothing more. It is not a measure of strategy quality, results are not predictive of future performance, and nothing here is financial advice.
Related
- Operations & system controls — where the health surface lives.
- Scheduler & jobs — the loops health watches.
- Notifications — how health alerts are routed.
- Troubleshooting & recovery — stale locks, zombie threads, and recovery.
Data sources
Reference for every market-data source Forven ingests — Binance/CCXT, Binance Vision, Polygon, Yahoo, CSV — plus symbol formats, enrichment streams, and the market calendar.
Notifications
How Forven routes alerts — in-app vs. Discord, dedupe-by-key with cooldown, severity-aware gating, and per-event delivery preferences.