Managing market data

Use the /data manager to fetch, inspect, and backfill OHLCV datasets from Binance, Polygon, Yahoo, and CSV before you backtest.

The Data Manager is where you build the market data your strategies are tested on. Before you can backtest anything, Forven needs clean OHLCV candles on disk; the /data page is how you fetch them, inspect their coverage and quality, and fill the gaps. Get this right first and the rest of the pipeline has something honest to work with.

This page is for any user populating their backtest universe. It covers the real /data UI — its tabs, the sources you can pull from, the coverage and health views, and the one-click backfill workflow. For the source-by-source technical detail (symbol formats, enrichment streams, API keys), see data sources.

What the Data Manager is for

Everything downstream — a backtest, the gauntlet, paper trading — reads from a local data lake of closed OHLCV bars. Forven stores them as Parquet files under your data root, and the /data page is the only place you manage that lake from the UI.

It does four things:

  • Ingest new symbols and timeframes from a chosen source.
  • Inspect what you have: coverage, row counts, date ranges, and quality.
  • Backfill gaps and extend datasets to the present.
  • Keep data warm in the background for the symbols you actually trade.

Forven only ever persists closed bars. Forming (in-progress) candles are dropped at write time, so a backtest can never accidentally see a bar that had not finished — a common and silent source of lookahead bias.

The /data layout

The page is tab-based. The tabs you will use:

TabWhat it shows
OverviewKPIs: dataset count, total rows, latest download, and which markets you cover.
DatasetsThe dataset list plus an inspector side-panel — coverage matrix, source health, and a quality leaderboard.
MaintenanceThe Data Engine: backfill planning and execution, source health, live stream status, and the task backlog.

A series drill-down modal lets you look at an individual symbol/timeframe pair, and a data activity log records every backfill, collector success or failure, gap fill, and orphan scan.

Data sources

Forven ingests from five source families. Pick the one that fits the asset and how much history you need.

SourceGood forNotes
CCXT / BinanceCrypto spot and futures, recent historyDirect exchange fetch; broad symbol coverage via the CCXT adapter.
Binance VisionBulk crypto historyStream-efficient downloader for the monthly and daily archives at data.binance.vision; probes the start date and resumes if interrupted.
Polygon.ioStocks, forex, indices, cryptoMulti-asset; requires POLYGON_API_KEY. Free tier is rate-limited (the client defaults to a conservative ~4 calls/min).
Yahoo FinanceMacro seriesUsed for macro indicators rather than primary trade symbols.
CSV uploadYour own dataImport a file you already have. Same symbol/timeframe collisions merge, last write wins.

Symbol format depends on the source — for example BTC-USDT in Forven's filesystem format, BTC/USDT in CCXT, X:BTCUSD in Polygon, and BTCUSDT in Binance Vision. Forven normalizes these at the import and export boundaries, so you enter the symbol the picker expects and it maps the rest. The full normalization rules live in data sources.

Steps: fetch data for a new symbol

This is the workflow you will run before your first backtest.

  1. Open the Data Manager at /data.
  2. Click Download Data (or switch to the Datasets tab and open the inspector).
  3. Choose a source: CCXT/Binance, Binance Vision, Polygon, Yahoo, or CSV upload.
  4. Enter the symbol (BTC-USDT, AAPL, EUR-USD, …) and the timeframe(s) you want (1h, 4h, 1d).
  5. Set a date range, or choose the "all available" option to backfill as much history as the source allows.
  6. Click Fetch to queue the download, and watch progress in the data-fetch status banner / Activity Log.
  7. When it completes, the dataset appears in the Coverage Matrix.
  8. If the matrix flags gaps, click the symbol cell to auto-backfill the gaps and extend the tail to the present.

What you'll see

A status banner tracks the fetch while it runs in the background. Once it finishes, the new dataset shows up in the Datasets list and the Coverage Matrix, with its row count, date range, and source. Ingestion runs through an async job, so the matrix counts can lag the underlying lake by a few seconds after a fetch.

Reading the coverage matrix

The Coverage Matrix is the heart of the inspector. Each cell is a symbol/timeframe pair, and its colour shows freshness — how recently the data was updated — so you can spot stale or partial datasets at a glance. The matrix also surfaces gaps: missing bars inside an otherwise-covered range. An orphan scan flags leftover temporary files from interrupted writes; Forven auto-cleans stale ones, so this is informational.

Treat a green, gap-free row as the precondition for a trustworthy backtest. A dataset with holes will quietly distort metrics.

Steps: backfill gaps in an existing dataset

  1. Go to /data and open the Coverage Matrix.
  2. Find a symbol/timeframe pair showing gaps (flagged by colour or a warning).
  3. Click the pair. Forven detects the missing bars, fetches each gap range, and extends the tail to now.
  4. Read the result: it reports how many gaps were found, filled, and remain, plus bars added.
  5. If the result shows the dataset was extended to now, it is current and ready to backtest.

For larger jobs, the Maintenance tab drives the Data Engine directly: open the Backfill Plan to preview which symbols and gaps it would touch (no execution), then click Catch up now to run the batch. Catch-up runs in the background, and the plan rescans afterward, so counts settle a moment later.

Checking quality before you backtest

Coverage is "do I have bars?" Quality is "can I trust them?" Before a strategy run that matters, run a quality check on the pair.

  1. On /data, select the symbol/timeframe pair.
  2. Open the Quality Check.
  3. Forven scans for gaps, reconciles close prices, and computes a checksum.
  4. Review the result: overlapping bars, maximum divergence percentage, and the missing-bar count.

If you have the same symbol from two sources — say Binance versus another feed — the divergence matrix shows how far their close prices disagree, which is how you catch a bad or misaligned feed before it poisons a backtest.

Every bar that enters the lake has already passed sanity gating at write time: high >= low, open and close inside the bar's range, prices positive, volume non-negative. Corrupt bars are dropped before they are ever stored.

Background keep-alive

You do not have to refresh data by hand for symbols already in the pipeline. A background sweep keeps OHLCV, derivatives, and macro data warm for the symbols you actively trade and for recently used backtest datasets.

It works by discovering active symbols from your strategies (those in paper, live_graduated, or deployed stages), finding the timeframes in use, and ranking pairs by staleness rather than cycling round-robin — so a cold pair never starves. The sweep runs on a cadence (around every 15 minutes by default) and respects a per-run cap so it never floods a source. Coverage Matrix colour reflects this freshness.

Enrichment streams

Beyond raw candles, Forven can collect nine background streams: OHLCV (spot and futures), funding rates, open interest, long/short ratios, taker volume, liquidations, the fear/greed index, macro indicators (VIX, DXY, bonds, sector ETFs), and BTC dominance.

When a backtest loads, these are merged onto the OHLCV bars on demand — matched to the nearest prior value so there is no lookahead. Streams that summarize a forward window (taker buy/sell ratio, liquidations) are shifted to the bucket close before merging, again to keep future information out of the past. You do not trigger this; it is transparent in the backtest context. If a stream is missing for a symbol, its columns are simply absent rather than zero-filled. See data sources for what each stream measures.

Where your data lives

Datasets are stored as Parquet under your data root. By default that sits inside your Forven home (FORVEN_HOME, e.g. %LOCALAPPDATA%\Forven), under data/. You can relocate the entire lake with FORVEN_DATA_DIR.

# Inspect where Forven thinks your data lives
$env:FORVEN_HOME
$env:FORVEN_DATA_DIR

Caveat: if FORVEN_DATA_DIR and FORVEN_HOME point at different roots, the OHLCV lake and the enrichment streams can end up split between them, and a packaged install may read empty enrichment data. Forven checks for this at startup and alarms in the logs if it sees a mismatch. Keep both consistent unless you have a reason not to.

Exporting a dataset

You can pull a dataset back out for external use. From the Dataset Detail view, open Export and choose CSV, Parquet, or JSON; Forven reads the Parquet, normalizes the OHLCV columns, applies the format transform (for example, ISO timestamps for CSV), and hands you a file download.

Caveats

  • Ingestion and backfill are asynchronous. The UI queues a job and reports progress; matrix counts can lag the lake briefly after a run.
  • Polygon's free tier is rate-limited. The client is deliberately conservative (~4 calls/min). If you hit 429 responses, an API key with higher quota helps.
  • CSV and exchange fetches can collide. The same symbol and timeframe from a CSV import and a live fetch merge on write — the later source stamps the dataset's metadata, but earlier bars are not lost.
  • Forven is a research tool. Backtests built on this data describe the past and are not predictive, and nothing here is financial advice.
  • Data sources — source-by-source reference: symbol formats, API keys, enrichment streams.
  • Backtesting a strategy — the next step once your data is clean.
  • Market regimes — how Forven reads the data into a market condition.
  • Quickstart — the full path from install to a paper strategy.