Backtest & Validation

A performance index is only tradable if it predicts something real. This page documents the backtest that validates OBV as a ranking signal for football team quality, and reports the results we observed against a finished EPL season.

Methodology

Input: Season-long OBV per match (team_season_obv_pg) for all 20 Premier League clubs over the 2023/24 season, taken directly from the data partner’s post-match REST feed. No other signals. No goals, no xG, no form, no head-to-head results, no league table.

Output: A ranking of the 20 clubs by OBV alone.

Benchmark: The actual 2023/24 Premier League final table — the ground-truth ranking by league points.

Metric: Spearman rank correlation (ρ), which measures how well the predicted ordering matches the true ordering. Ranges from −1 (perfectly reversed) to +1 (identical), with 0 meaning random.

Result: ρ = 0.9023

Using OBV alone:

Spearman ρ = 0.9023 — a very strong positive rank correlation.

In plain English: if you sort the 20 EPL clubs by their OBV per match over the 2023/24 season, the resulting list is almost identical to the actual finish order. OBV recovers ≈ 81% of the variance in final-table ordering.

Exact matches

Several teams hit their exact finish position on the OBV ranking alone:

Bottom four. Nottingham Forest, Luton Town, Burnley, and Sheffield United — the four lowest OBV teams — were also the four bottom finishers. A perfect match.
Manchester United at 8th. OBV ranked Man Utd 8th; they finished 8th.
West Ham at 9th. OBV ranked West Ham 9th; they finished 9th.

That six teams (30% of the league) hit exact rank from a single statistical signal is an unusually strong out-of-sample result.

Why this matters for a tradable index

Three implications:

OBV is a genuine skill signal, not a lagging description. Because OBV is an event-level probabilistic measure, it accumulates signal long before it shows up in goals, points, or league position. A team with strong OBV but unlucky results in early matches tends to regress upward; a team with weak OBV but fortunate early results tends to regress downward. This is precisely the kind of signal a performance perp should price.
The composite adds defensive depth to an already-strong foundation. OBV alone hits ρ = 0.9023. The composite layers in form (30% weight) and results (20% weight) primarily to dampen sample-size noise in short-window trading and to bias the index toward outcomes that are actually happening. The composite’s goal is not to improve the backtest — it’s to make the index behave well under trading conditions.
There is room upstream, not just downstream. The 0.0977 of variance unexplained by OBV alone is information still on the table. Potential sources — injuries, fixture difficulty, manager changes, lineup context — are candidates for future composite-weight proposals under $SPERP governance.

Reproducibility

The backtest is reproducible by any subscriber to the data partner’s REST feed. The numeric inputs are team_season_obv_pg values for the 2023/24 season; the Spearman ρ calculation is a three-line Python snippet:

from scipy.stats import spearmanr
obv_rank   = sorted(teams, key=lambda t: -t.obv_pg)       # by OBV, desc
table_rank = sorted(teams, key=lambda t: -t.points)       # by points, desc
rho, _ = spearmanr([obv_rank.index(t) for t in teams],
                   [table_rank.index(t) for t in teams])
# rho = 0.9023

What the backtest does not validate

Honest accounting of limits:

Single-season, single-league sample. ρ = 0.9023 on 2023/24 EPL is one data point. Multi-season and multi-league validation work is tracked in the roadmap. Nothing here claims OBV-based indices generalize perfectly to every league in every season.
Rank correlation is not price realism. A perfect ranking doesn’t mean a perfect price. A market pricing Man City at 800 vs 750 at different points in a season reflects different things even if both are “correctly” above all other teams on rank. The index-to-price mapping (500 + raw × 100) is a convention, not a claim about absolute fair value.
In-match OBV is a reimplementation, not an official number. During live matches, the index is driven by SportsPerp’s real-time OBV inference pipeline (real-time vs post-match), which is calibrated to but not identical to the partner’s official post-match OBV. Post-match reconciliation (the 4-hour EMA blend) closes this gap. The backtest result above uses canonical post-match OBV from the partner’s REST feed only.
Player-level validation is more limited. Spearman ρ for player OBV against a ground-truth player ranking is not a clean comparison — there’s no equivalent of the league table for individual players. Player-market validation leans on other signals: cross-season stability of OBV-per-90, correlation with transfer-market valuations, and expert-poll agreement (e.g., PFA Team of the Year).