This guide is educational content explaining how our data pipeline works. It is not investment advice. Always do your own research before making investment decisions.
Introduction
The scoring pipeline is only as good as its data. Every factor score, elimination filter, and scoring gate depends on numbers that arrive from external providers on their own schedules. Here is where every data point comes from, how often it updates, and what delays to expect.
Data Pipeline Overview
The system runs a batched ingest pipeline daily, designed to process thousands of tickers reliably without overwhelming data providers.
orchestrate_ingestkicks off at 21:30 UTC (after US market close)- Divides the full universe (~7,000+ tickers) into batches of ~50
- Runs up to 3 concurrent batches with rate limiting (36 requests/minute)
- After all batches complete, a sweep verifies completeness
- The scoring chain fires automatically:
full_score→full_score_v3→full_score_v4
Total pipeline runtime is approximately 2--4 hours depending on universe size and provider response times. By the time US markets open the next morning, all scores reflect the previous day's data.
Data Sources by Type
Each data type flows through a prioritized chain of providers. If the primary provider is unavailable or returns incomplete results, the system falls back automatically.
| Data Type | Primary Provider | Fallback | Update Frequency | Typical Lag | |-----------|-----------------|----------|------------------|-------------| | Fundamental data (financials) | FMP | yfinance → SEC EDGAR | Quarterly (follows earnings) | 1--45 days after earnings | | Price data (daily OHLCV) | Polygon | yfinance | Daily | Same day (after market close) | | Insider transactions | SEC EDGAR Form 4 | Finnhub | As filed | 2 business days | | 13F institutional holdings | SEC EDGAR | -- | Quarterly | 45-day filing deadline | | Market data (volume, market cap) | FMP | yfinance | Daily | Same day | | Macro indicators (Shiller CAPE) | FRED | -- | Monthly/Quarterly | Varies by indicator |
The fallback system means that even if a primary provider experiences downtime, scores continue to be calculated using backup sources. The engine always prefers the primary provider and only falls back when data is unavailable.
Update Cadences
Different data types arrive on different schedules. Understanding these cadences helps set expectations for when scores will change.
- Daily (21:30 UTC): Price, volume, market cap, and basic financials refresh via the batched ingest pipeline
- Daily (22:00 UTC): 13F filing ingest checks for newly published filings on SEC EDGAR
- Quarterly: Fundamental data refreshes when companies report earnings, typically 2--6 weeks after quarter end. This is the primary driver of score changes.
- Weekly (Saturday 2:00 AM UTC): ML model training runs, recalculating cluster assignments and VAE adjustments
- On-demand: Insider transaction data arrives as new SEC Form 4 filings appear, typically within 2 business days of the transaction
Data Windows
The engine uses fixed lookback windows to ensure every company is evaluated on the same basis:
- Historical depth: Up to 5 years of quarterly financials where available
- Volume averaging: 60-day rolling window for dollar volume calculations
- Trend analysis: Multi-year periods for ROIC trends, revenue CAGR, and margin trajectories
- 13F history: Available back to 2013 for curated institutional managers
- Minimum data requirement: At least 5 years of history for robust analysis -- companies with less history may receive incomplete scores or inconclusive filter results
These windows are fixed by design. Using consistent lookback periods prevents recency bias from skewing results.
Troubleshooting
"Why hasn't my stock's score changed?" -- Between earnings seasons, fundamental data does not change. Price and momentum data update daily, but factor scores derived from quarterly financials may remain stable for weeks. The majority of score movement comes from new quarterly financial data, which arrives in waves during earnings season.
"Why do two providers show different numbers?" -- Fallback providers may use different reporting periods or data cleaning methodologies. The system always prefers the primary provider and only falls back when data is unavailable. Minor discrepancies between sources typically resolve when the next quarterly filing is processed.
"Why is a score 'inconclusive'?" -- Some filters require minimum data history (e.g., 5 years of free cash flow data). Recently IPO'd companies may lack sufficient history. Inconclusive is treated as a filter failure -- when the engine cannot confirm a company passes a safety check, it does not assume the best case. This is a deliberately conservative design.
Pipeline Architecture
Known Limitations
- Newly IPO'd companies (less than 2 years of history) may lack sufficient data for reliable scoring. Some filters require multi-year financial history that is not yet available.
- US-focused coverage: Provider coverage is strongest for US equities. International listings have limited coverage and may have incomplete data across multiple factors.
- No real-time data: All data reflects the previous trading day at earliest. Intraday events (earnings releases, major news) are not immediately reflected in scores.
- Sector classification lag: GICS classifications may lag for companies undergoing business model changes. Until the classification updates, sector-neutral scoring may place the company in the wrong peer group.
- Provider outage fallbacks: During provider outages, fallback sources may use different reporting periods or data cleaning methodologies, producing slightly different values.
- Free cash flow variability: FCF data quality varies by industry. REITs, banks, and other financial companies report cash flows differently, which can affect filter accuracy. REITs and financial companies are currently excluded from scoring entirely.
- 13F reporting delay: Institutional holdings data has a 45-day filing deadline after quarter end. By the time 13F data is available, some of the position changes may already be reflected in price.