Backtest & Validation
A performance index is only tradable if it predicts something real. This page documents the backtest that validates OBV as a ranking signal for football team quality, and reports the results we observed against a finished EPL season.
Methodology
Input: Season-long OBV per match (team_season_obv_pg) for all 20 Premier League clubs over the 2023/24 season, taken directly from the data partner’s post-match REST feed. No other signals. No goals, no xG, no form, no head-to-head results, no league table.
Output: A ranking of the 20 clubs by OBV alone.
Benchmark: The actual 2023/24 Premier League final table — the ground-truth ranking by league points.
Metric: Spearman rank correlation (ρ), which measures how well the predicted ordering matches the true ordering. Ranges from −1 (perfectly reversed) to +1 (identical), with 0 meaning random.
Result: ρ = 0.9023
Using OBV alone:
Spearman ρ = 0.9023 — a very strong positive rank correlation.
In plain English: if you sort the 20 EPL clubs by their OBV per match over the 2023/24 season, the resulting list is almost identical to the actual finish order. OBV recovers ≈ 81% of the variance in final-table ordering.
Exact matches
Several teams hit their exact finish position on the OBV ranking alone:
- Bottom four. Nottingham Forest, Luton Town, Burnley, and Sheffield United — the four lowest OBV teams — were also the four bottom finishers. A perfect match.
- Manchester United at 8th. OBV ranked Man Utd 8th; they finished 8th.
- West Ham at 9th. OBV ranked West Ham 9th; they finished 9th.
That six teams (30% of the league) hit exact rank from a single statistical signal is an unusually strong out-of-sample result.
Why this matters for a tradable index
Three implications:
- OBV is a genuine skill signal, not a lagging description. Because OBV is an event-level probabilistic measure, it accumulates signal long before it shows up in goals, points, or league position. A team with strong OBV but unlucky results in early matches tends to regress upward; a team with weak OBV but fortunate early results tends to regress downward. This is precisely the kind of signal a performance perp should price.
- The composite adds defensive depth to an already-strong foundation. OBV alone hits ρ = 0.9023. The composite layers in form (30% weight) and results (20% weight) primarily to dampen sample-size noise in short-window trading and to bias the index toward outcomes that are actually happening. The composite’s goal is not to improve the backtest — it’s to make the index behave well under trading conditions.
- There is room upstream, not just downstream. The 0.0977 of variance unexplained by OBV alone is information still on the table. Potential sources — injuries, fixture difficulty, manager changes, lineup context — are candidates for future composite-weight proposals under $SPERP governance.
Reproducibility
The backtest is reproducible by any subscriber to the data partner’s REST feed. The numeric inputs are team_season_obv_pg values for the 2023/24 season; the Spearman ρ calculation is a three-line Python snippet:
from scipy.stats import spearmanr
obv_rank = sorted(teams, key=lambda t: -t.obv_pg) # by OBV, desc
table_rank = sorted(teams, key=lambda t: -t.points) # by points, desc
rho, _ = spearmanr([obv_rank.index(t) for t in teams],
[table_rank.index(t) for t in teams])
# rho = 0.9023What the backtest does not validate
Honest accounting of limits:
- Single-season, single-league sample. ρ = 0.9023 on 2023/24 EPL is one data point. Multi-season and multi-league validation work is tracked in the roadmap. Nothing here claims OBV-based indices generalize perfectly to every league in every season.
- Rank correlation is not price realism. A perfect ranking doesn’t mean a perfect price. A market pricing Man City at 800 vs 750 at different points in a season reflects different things even if both are “correctly” above all other teams on rank. The index-to-price mapping (
500 + raw × 100) is a convention, not a claim about absolute fair value. - In-match OBV is a reimplementation, not an official number. During live matches, the index is driven by SportsPerp’s real-time OBV inference pipeline (real-time vs post-match), which is calibrated to but not identical to the partner’s official post-match OBV. Post-match reconciliation (the 4-hour EMA blend) closes this gap. The backtest result above uses canonical post-match OBV from the partner’s REST feed only.
- Player-level validation is more limited. Spearman ρ for player OBV against a ground-truth player ranking is not a clean comparison — there’s no equivalent of the league table for individual players. Player-market validation leans on other signals: cross-season stability of OBV-per-90, correlation with transfer-market valuations, and expert-poll agreement (e.g., PFA Team of the Year).
Further reading
- Composite Index Design — how OBV is combined with form and results into the tradable index.
- Real-Time vs Post-Match — how the live path differs from the post-match data used in this backtest.
- PRD — §2 Product, Validation — internal product spec including the backtest context.