Data sources
Reference for every market-data source Forven ingests — Binance/CCXT, Binance Vision, Polygon, Yahoo, CSV — plus symbol formats, enrichment streams, and the market calendar.
This is the source-by-source reference for Forven's market-data layer. The data manager page covers the /data UI and the day-to-day backfill workflow; this page documents what each source provides, how symbols are formatted across them, the enrichment streams that ride alongside OHLCV, and how the market calendar shapes session awareness.
It is written for developers and operators who need the exact source names, symbol conventions, config keys, and API endpoints. Everything below is what the data layer actually does — no source is documented that Forven does not ingest.
The sources at a glance
Forven ingests OHLCV from five source families. A symbol's asset class is detected from its format, and the right source is used automatically when you do not pick one.
| Source | Identifier | Good for | Requires a key |
|---|---|---|---|
| Binance (spot + futures) | binance | Crypto OHLCV, the default for live-traded symbols | No |
| CCXT adapter | ccxt | Additional exchanges beyond Binance | No (exchange-dependent) |
| Binance Vision | binance-vision | Bulk historical crypto archives (years of bars) | No |
| Polygon.io | polygon | Multi-asset: stocks, forex, indices, crypto | Yes (POLYGON_API_KEY) |
| Yahoo Finance | yahoo | Macro series (VIX, DXY, bonds, sector ETFs) | No |
| CSV upload | csv | Your own bars or data not covered above | No |
The live /api/data/sources endpoint reports the same list with an availability flag, a required_key flag, and the asset_types each source can serve. Use it to confirm what is reachable in your install before you start a fetch.
# List available sources and which ones need a key
curl http://127.0.0.1:8003/api/data/sourcesBinance and CCXT
Binance is the default crypto source and covers both spot and futures markets. The CCXT adapter sits behind it to reach additional exchanges where you need them. Neither requires a key for public OHLCV.
Because Forven routes live orders to HyperLiquid, it is common to hold both Binance and HyperLiquid candles for the same symbol. The quality check can compare the two — see the divergence note under Verifying quality below.
Binance Vision
Binance Vision is the bulk historical downloader. It pulls monthly and daily archives directly from data.binance.vision, which is the fastest way to seed years of history without thousands of paginated API calls.
It is built to survive interruptions: it probes for the true start date of a symbol, tracks which dates it has already covered, and resumes a partial backfill rather than restarting. Use it for the first big backfill of a symbol; use Binance for keeping the tail current.
Polygon.io
Polygon is the multi-asset source — stocks (AAPL), forex (EUR-USD), indices, and crypto. It is the only source here that requires a key.
Set the key in Settings → API Keys or via the POLYGON_API_KEY environment variable. Without it, Polygon will not appear as usable in the /data ingestion picker.
# Provide the Polygon key via environment variable
$env:POLYGON_API_KEY = "your-polygon-key"The Polygon client rate-limits itself conservatively at 4 calls per minute by default, which sits just under the free tier's ~5/min ceiling. The number below is illustrative of the free tier, not a guarantee — if you have a paid plan with higher quota and you are seeing throttling, raise the limit.
Yahoo Finance
Yahoo supplies the macro series Forven uses for context and enrichment — VIX, DXY, bond yields, and sector ETFs. You will rarely fetch from it directly; it feeds the macro enrichment stream described below.
CSV upload
CSV import lets you bring your own bars. One caveat worth knowing up front: if you import a CSV for a symbol/timeframe you also fetch from Binance, the two can collide. Forven does not lose the earlier data — both are combined on save — but the last source to write stamps the dataset's source metadata. Decide which source is canonical for a symbol and stick to it.
Symbol formats
The same instrument is spelled differently by each source. Forven normalizes between four formats at the import and export boundaries, so you generally type the canonical form and let the layer translate.
| Context | Format | Example |
|---|---|---|
| Filesystem / canonical | BASE-QUOTE | BTC-USDT |
| CCXT | BASE/QUOTE (or :SETTLE) | BTC/USDT, BTC/USDT:USDT |
| Polygon | prefixed ticker | X:BTCUSD |
| Binance Vision | concatenated | BTCUSDT |
Asset class is detected from the symbol's shape, so you do not declare it:
- Crypto:
BTC-USDT,BTC/USDT:USDT - Stocks:
AAPL - Forex:
EUR-USD - Indices: index tickers
When you type a symbol in the /data ingestion picker, use the canonical BASE-QUOTE form (or a plain ticker for equities). The layer maps it to whatever the chosen source expects.
Enrichment streams
OHLCV is the spine, but Forven collects nine background streams in total and can merge the derivative and macro ones onto bars for a backtest. The collectors run proactively, ranked by staleness so cold symbols do not starve.
| Stream | What it measures |
|---|---|
| OHLCV | Spot and futures candles (the base series) |
| Funding rates | Perp funding paid/received |
| Open interest | Outstanding contract notional |
| Long/short ratio | Account or position skew |
| Taker volume | Aggressive buy/sell flow |
| Liquidations | Forced-close volume |
| Fear/greed index | Sentiment proxy |
| Macro indicators | VIX, DXY, bonds, sector ETFs |
| BTC dominance | BTC share of crypto market cap |
Enrichment happens on demand during a backtest. The load phase merges available streams onto the OHLCV frame using a merge-asof join — each bar is matched to the nearest prior value of each stream, so no future information leaks backward.
Two things to know about how this avoids lookahead bias:
- Bucket-aggregate streams are shifted to bucket close. Taker buy/sell ratio and liquidations are sampled at a bucket's start but summarize the forward window, which is only known at close. Forven shifts their timestamps to the bucket close before merging, so an in-progress bucket can never be merged onto a finer bar.
- Missing streams stay absent, not zeroed. If a stream is unavailable for a symbol, its columns are simply not present (rather than silently filled with zeros that a strategy might trade on). Where a default is sensible within an available stream, funding fills as
0and ratios as1.
Point-in-time reconstruction (as_of) is supported for OHLCV only, via the revision log. Enrichment streams (funding, OI, and so on) do not support as_of — a backtest using point-in-time OHLCV must source those bars accordingly rather than through on-demand enrichment.
The market calendar
Crypto trades around the clock, but equities and forex do not. Forven carries a market calendar so session-aware strategies and data checks know when a market is actually open: NYSE hours and holidays for equities, session windows for forex, and always-on for crypto. This keeps a stock backtest from treating an overnight gap as a missing bar, and lets the data layer reason about expected coverage per asset class.
How data lands on disk
Every persisted bar is a closed bar. Forming candles are dropped at the write boundary, and each write is atomic: Forven writes to a temporary file, fsyncs it, then atomically renames it into place. A crash between write and rename leaves a stray .tmp file, which a background orphan scan cleans up after it ages out. Before any bar enters the lake it passes an OHLC sanity check — high ≥ low, open and close within the bar's range, positive prices, non-negative volume — so corrupt bars never reach a backtest.
You do not manage any of this directly; it is the contract that lets you trust what the data manager shows you.
Configuration
Where the data lake lives, and how regime gating behaves, are controlled by a small set of environment variables and settings keys. Full reference for each lives in environment variables and the configuration reference.
Where data is stored
| Variable | Meaning |
|---|---|
FORVEN_HOME | Base directory for packaged installs (e.g. %LOCALAPPDATA%\Forven). Data is stored under $FORVEN_HOME/data/. |
FORVEN_DATA_DIR | Explicit override for the data-lake root. If set, all streams — OHLCV, funding, OI, derivatives, macro — live here. |
FORVEN_DB | SQLite database path. The catalog is queried to discover which symbols are actively traded so the keep-alive sweep knows what to keep warm. |
Keep FORVEN_DATA_DIR and FORVEN_HOME consistent. If they diverge, the OHLCV lake and the enrichment streams can end up under different roots, and a packaged install may read empty enrichment streams. Forven asserts root consistency at startup and raises an alarm on mismatch, but it is easiest to set one or the other deliberately and leave it.
Remote data engine (optional)
For a shared-server setup, the data layer can federate to a remote Forven instance instead of reading local Parquet.
| Setting / variable | Meaning |
|---|---|
remote_engine_enabled | Route data queries to a remote Forven instance. |
remote_engine_url | Base URL of the remote engine. |
FORVEN_REMOTE_ENGINE_DATA_ROOT | Remote data-engine root path; overrides the settings value. |
FORVEN_REMOTE_ENGINE_ALLOWED_ROOT | Security boundary — remote paths must sit under this root. |
Regime gating
Regime detection (covered in full on market regimes) is configured here because it decides which strategies a market's current condition will admit.
| Setting | Default | Meaning |
|---|---|---|
regime_min_confidence | 0.3 | Minimum detection confidence [0.0–1.0] to pass the gate. Raise for stricter gating. |
strict_regime_gating | — | If true, block strategies incompatible with the detected regime and reject low-confidence detections. Permissive when false. |
allow_unknown_regime_strategies | — | If true, allow strategies with no entry in the compatibility matrix. Blocked in strict mode if false. |
Steps: ingest data from a new source
You drive ingestion from the /data page. The path is the same regardless of which source you choose.
- Open the
/datapage and select the Ingestion tab. - Choose a source: Binance, Polygon.io, Binance Vision, CSV, or Yahoo Finance. (For Polygon, make sure
POLYGON_API_KEYis set first.) - Enter the symbol in canonical form —
BTC-USDT,AAPL,EUR-USD— and the timeframe(s) you want. - Set a date range, or choose the
all_availableoption to backfill the full history (best paired with Binance Vision for crypto). - Click Fetch and watch progress in the Activity Log.
- When it completes, the dataset appears in the Coverage Matrix. If gaps are flagged, click the symbol cell to auto-backfill them and extend the tail to the present.
What you'll see: a new row in the Coverage Matrix whose color reflects freshness, plus entries in the data activity log for each collector success, gap fill, and orphan scan. The dataset is then ready to backtest against.
The same flow is available over the API for scripting:
# Start an async ingestion job (returns a run_id)
curl -X POST http://127.0.0.1:8003/api/data/ingestion/submit `
-H "Content-Type: application/json" `
-d '{\"symbol\":\"BTC-USDT\",\"timeframe\":\"1h\",\"exchange\":\"binance\",\"all_available\":true}'
# Poll the run
curl http://127.0.0.1:8003/api/data/ingestion/runsVerifying data quality
Before you trust a dataset, run a quality check. It scans for gaps, reconciles close prices across sources, and computes a checksum.
# Run validation for a symbol/timeframe
curl -X POST http://127.0.0.1:8003/api/data/quality `
-H "Content-Type: application/json" `
-d '{\"symbol\":\"BTC-USDT\",\"timeframe\":\"1h\"}'The result reports overlap_bars, max_divergence_pct, and the missing-bar count. When you hold the same symbol from two sources — say Binance and HyperLiquid — the divergence figure tells you how far they disagree, which is worth reviewing before you rely on either for a live-adjacent test.
API surface
The data layer is served by the /api/data/* router on the local backend (127.0.0.1:8003). The most useful endpoints:
| Method | Path | Purpose |
|---|---|---|
GET | /api/data/sources | List sources with availability, required_key, and asset_types. |
GET | /api/data/datasets | List local datasets with symbols, timeframes, row counts, date ranges, checksums. |
POST | /api/data/ingestion/submit | Start an async fetch job; returns a run_id. |
GET | /api/data/ingestion/runs | List ingestion jobs with status, bars_fetched, bars_new. |
GET | /api/data/{symbol}/{timeframe} | Dataset detail: source, range, checksum, gaps, quality metrics. |
GET | /api/data/{symbol}/{timeframe}/ohlcv | Read the last N bars as JSON (limit, default 100). |
POST | /api/data/quality | Run gap/divergence/checksum validation. |
GET | /api/data/health | Per-stream stats, latest collection times, error counts. |
GET | /api/data/activity | Activity log: backfills, collector results, gap fills, orphan scans. |
GET | /api/data/export/{symbol}/{timeframe} | Download a dataset as CSV, Parquet, or JSON. |
GET | /api/data/engine/status | Engine status: enabled flag, lake root, remote config, backfill queue. |
POST | /api/data/engine/catchup | Execute backfill for stale pairs (max_tasks). |
See the API reference for the full router catalog and authentication.
Caveats
- Free-tier rate limits. Polygon's free tier is roughly 5 calls/minute and Forven defaults to 4. Large multi-asset backfills will be slow without a paid key. The numbers here are illustrative of the providers' published limits, not a Forven guarantee.
- Source metadata is last-write-wins. Mixing CSV imports and live fetches for the same symbol/timeframe means the most recent write stamps the source label. Pick one canonical source per symbol.
- Point-in-time is OHLCV-only. Enrichment streams cannot be reconstructed as-of a past date.
Forven is a research tool. Clean data improves the honesty of a backtest, but no dataset makes a result predictive of future performance, and nothing here is financial advice.
Related
- Managing market data — the
/dataUI and the backfill workflow. - Market regimes — how detected regimes gate strategies.
- Configuration reference — full config precedence and key groups.
- Environment variables — every data and storage variable in detail.
Writing custom strategies
Author a custom Forven strategy by implementing the BaseStrategy interface — generate_signal, metadata, optional overrides, regime compatibility, and the safety guard.
Health monitoring
How Forven's health monitor tracks component states, enforces data-stream SLAs, and routes amber and red alerts so you catch trouble early.