ML Pipeline | Margin Invest Guides

Introduction

The deterministic scoring pipeline — elimination filters, factor scoring, track cascades, composite tier assignment — captures known, well-researched patterns. Every formula in it has an academic source. Every threshold has a reason.

But markets contain patterns that traditional factor models miss. Cross-factor interactions shift across regimes. The relative predictive power of individual factors changes over time. A static weight matrix cannot adapt to these dynamics.

The ML pipeline is an additive refinement layer that learns from the system's own prediction history. It observes which factor combinations actually predicted outperformance over the past 90+ days, and nudges composite tiers accordingly.

Note

ML is optional and secondary. The deterministic pipeline is the foundation. ML adjustments are bounded to one composite tier in either direction, fully auditable, and transparent. The system works completely without ML — and new deployments start that way by design.

Why ML on Top of Deterministic Scoring?

Factor models capture what decades of finance research have proven: cheap stocks with high quality and positive momentum tend to outperform. These relationships are well-established and persistent.

What factor models do not capture is how those relationships interact in specific contexts. Among high-quality technology stocks, does momentum or value matter more right now? When the market regime shifts from expansion to contraction, which factor combinations hold up and which break down?

Think of it this way: factors tell you what to measure. ML tells you how those measurements interact in the current environment.

The ML layer does not replace factors. It cannot introduce new data or override the pipeline's elimination filters. It adjusts the relative weighting of existing signals based on what has actually been working — and it can only move a stock's composite tier by one level in either direction. A stock that scores LOW on the deterministic pipeline cannot be promoted to EXCEPTIONAL by ML. The bounds are hard constraints.

Training Process

Models train on a weekly schedule: Saturday at 2:00 AM UTC, when markets are closed and no scoring runs are in progress.

The training process requires a minimum of 100 scored assets in the database. Below that threshold, the training job completes gracefully without producing a model — the system needs enough predictions to evaluate whether the ML layer is actually helpful.

After training, the model is evaluated using rank IC (rank information coefficient): the Spearman correlation between the model's predicted ranks and actual future returns. Only models with a rank IC above 0.15 are activated. This is a quality threshold, not a business rule — 0.15 means the model's predictions have a statistically meaningful relationship with outcomes.

If the model cannot predict better than chance, it stays dormant. The deterministic pipeline continues to run without any ML adjustment.

Cluster Models

Not all stocks should be scored the same way. A mature dividend-paying utility and a high-growth SaaS company have fundamentally different risk-return profiles — the factors that predict outperformance for one group may be irrelevant for the other.

The cluster model groups stocks with similar factor profiles together using unsupervised learning. Within each cluster, a separate LightGBM model learns which factor combinations predict forward returns for that specific peer group.

Cluster-specific factor importance

Among a cluster of high-quality tech stocks, the model might learn that momentum signals are more predictive than value signals — these companies rarely trade at deep value, so traditional valuation metrics carry less weight. In a cluster of mature industrials, the model might find the opposite: value and capital allocation factors dominate, while momentum adds noise.

Clusters are recalculated each training cycle. As market conditions shift and factor profiles change, stocks can migrate between clusters. A company transitioning from high growth to steady state will naturally move into a cluster where value factors carry more weight.

VAE (Variational Autoencoder)

The FactorVAE serves a different purpose from the cluster models. Where cluster models predict which factors matter most, the VAE detects anomalous scoring patterns — stocks whose factor profiles are unusual relative to their peers.

The VAE learns a compressed representation (latent space) of normal factor profiles. When a stock's factor profile does not compress and reconstruct well, that reconstruction error signals something unusual about its factor combination. This can indicate either opportunity (an overlooked stock) or risk (a stock whose fundamentals are deteriorating in a pattern the model has not seen before).

The VAE uses a prior-posterior architecture: during training, the encoder sees both features and future returns, while at inference, only the predictor (prior network) runs — using features alone. This design prevents look-ahead bias while allowing the model to learn what factor patterns are associated with future outcomes.

Score Adjustment

When ML is active, the ensemble override system can promote or demote a stock's composite tier by exactly one level. The adjustment is not a score multiplier — it moves the tier category (e.g., MEDIUM to HIGH, or HIGH to MEDIUM).

The decision process:

The model must qualify (rank IC > 0.15 from training)
The GBM and VAE predictions are blended (60% GBM weight, 40% VAE weight)
Confidence is computed from the VAE variance — lower variance means higher confidence
If confidence is below 0.60, no adjustment is made
The blended ML signal is percentile-ranked against all stocks in the universe
If the signal is in the top 15% (85th+ percentile) with confidence above 0.75: promote one level
If the signal is in the bottom 15% (15th- percentile) with confidence above 0.75: demote one level
Otherwise: no adjustment

The "ML Adjusted" badge appears on stocks where the ML layer has changed the composite tier. The original deterministic composite tier and the adjusted tier are both stored and visible in the ML Audit Panel on the asset detail page.

Graceful Degradation

The ML pipeline was designed to be absent. Every code path that touches ML predictions handles the case where no qualified model exists.

New deployment: No scoring history exists, so no model can train. The system runs deterministic-only scoring. The training job runs each Saturday, checks the sample count, and exits gracefully until enough data accumulates.
Model quality drops: If a previously qualified model's rank IC falls below 0.15 on the next training cycle, the new model is saved as non-qualifying. The V4 scoring pipeline queries for the latest qualified model — if none exists, ml_predictions is None and all stocks receive purely deterministic scores.
Prediction failure: If the ML prediction step fails for a specific cluster or the VAE, errors are caught and logged. The pipeline continues with whatever predictions succeeded, or falls back to deterministic-only.

The system status clearly indicates whether ML is active or dormant. When no qualified model exists, the dashboard and asset detail pages show deterministic scores without any ML badges or adjustment indicators.

The system was designed to work without ML. ML makes it better when it earns its place, but is never required.

Verify It Yourself

Verify it yourself

ML transparency

Check any stock's asset detail page. If ML is active, you will see an "ML Adjusted" badge on the composite tier panel. Click it to see the adjustment amount and the original deterministic score. If no badge appears, the score is purely deterministic. You can also check the ML Audit Panel below the scoring section for the model's rank IC, confidence level, and whether the current model qualifies.

Known Limitations

Models need 100+ scored assets before they can be trained — new deployments start deterministic-only and may remain so for weeks until the universe is large enough.
Rank IC threshold (0.15) means models may not activate for months if prediction quality is insufficient. This is intentional — a bad model is worse than no model.
ML adjustments are correlational, not causal — they identify patterns in factor interactions, not the economic mechanisms behind them.
Models retrain weekly — scores may shift slightly after Saturday training runs as the model updates its view of which factor combinations are working.
Cluster membership can change between training cycles, causing subtle score changes for stocks near cluster boundaries.
ML model quality varies across market regimes — models trained during extended bull markets may underperform during sharp reversals, which is why the rank IC qualification gate exists.
LightGBM and VAE hyperparameters are fixed, not tuned per cycle. This prioritizes stability over marginal accuracy gains but means the models may not optimally adapt to unusual market conditions.
VAE training can fail silently — if it does, the pipeline continues with GBM-only predictions and reduced ensemble diversity.