Data Pipeline
SportsPerp’s index is only as good as its inputs. This page documents the full path from a football match being played in a Premier League stadium to an updated oracle price on Solana.
Data sources
Two independent feeds from our institutional data partner power the engine:
| Source | Mode | Protocol | Latency | What it provides |
|---|---|---|---|---|
| Post-match REST feed | Post-match, batch | REST | ~30 minutes post-match | Official OBV, season aggregates, match outcomes |
| Live event feed | In-match, streaming | GraphQL subscriptions | 5–15 seconds per event | Raw event stream (passes, shots, defensive actions, etc.) |
The REST feed is the canonical source — its numbers are what the partner publishes and what professional clubs consume. The live feed delivers event-level data during matches so SportsPerp can price markets in-play rather than freezing between fixtures.
The data-partner vendor’s identity is not disclosed publicly for competitive reasons.
Post-match pipeline (the batch path)
The batch path is simpler and is the authoritative source for index values between matches.
Post-match REST feed
│
â–¼
[ crank.ts ] ◄─── every 5 min (fetchIntervalMs = 300000)
│
├─► season team stats (team_season_obv_pg, …)
├─► season player stats (player_season_obv_90, …)
└─► recent match results (W/D/L, dates)
│
â–¼
[ form-calculator.ts ] ◄── exponential-decay form, PPG
│
â–¼
[ index-calculator.ts ] ◄── z-score per population, compose, scale
│
├─► candle-store.ts (SQLite 1m OHLC; aggregated 1H/4H/1D on read)
├─► ws-server.ts (broadcasts ticks to connected clients)
└─► oracle-pusher (update_oracle on-chain)Each crank cycle:
- Fetch season stats for all 20 teams and eligible players via REST.
- Compute form scores (exponential decay over last 6 matches) and PPG (average over last 10).
- Calculate z-scored composite indices for teams (league-wide) and players (within-position).
- Persist ticks to SQLite candles and broadcast on WebSocket.
- Push to the on-chain oracle only if the change exceeds the threshold. Thresholds are tunable; the off-chain pipeline has separate gates at the crank layer (per-cycle) and the pusher layer (final on-chain gate) to keep transaction costs bounded.
- Heartbeat any market that hasn’t seen an on-chain push within the pusher’s heartbeat interval, so the on-chain staleness protection never triggers under normal operation.
The crank is self-healing: if a REST call fails, it logs the error and tries again on the next cycle. The on-chain oracle never receives a partial or inconsistent update.
Live pipeline (the streaming path)
During live matches, the engine switches to a parallel streaming path that layers in-match event data on top of the batch baseline.
Live event feed (GraphQL subscriptions)
│
â–¼
[ live-processor.ts ] ◄── subscribes to match event stream
│
â–¼
[ id-bridge.ts ] ◄── translates live IDs to canonical IDs (fail-closed)
│
â–¼
[ obv-engine (Python sidecar) ]
│ PV-GF and PV-GA XGBoost models
│ annotates each event with OBV delta
│ consumed via POST /api/live-obv/matches/{id}/{start,events,end}
│
â–¼
[ obv-store.ts ]
│ per-match authoritative state
│ per-(category × {net, gf, ga}) cross-tab
│
â–¼
[ live-index-overlay.ts ]
│ pinned z-score population from last batch cycle
│ only playing teams/players get overlay updates
│
â–¼
[ fallback-chain.ts ]
│ authoritative / aggregated / heuristic
│
â–¼
same downstream (candles, WS, oracle push)Live↔Canonical ID bridge
The partner’s REST and live feeds use independent ID spaces. For example, Arsenal might be REST id 1 but live id 21; a player might be REST 39461 but live 106232. roster.json is keyed by canonical (REST) id, so every live event must be translated to its canonical entity before it can be attributed to a market.
The ID bridge is a fail-closed translator: any unmapped live id drops the event and increments a counter rather than mis-attributing it. Operator overrides live in a companion config file. Behaviour is gated by USE_LIVE_REST_BRIDGE_V1, USE_LIVE_REST_BRIDGE_V2, SHADOW_DROP_DIAG, and ID_BRIDGE_OVERRIDE_STRICT environment variables. V2 is the current production setting and is verified periodically by a systemd timer on the production host.
Note: live-side IDs are not guaranteed stable across matches for the same entity — the bridge resolves the mapping against each match’s lineup, not against a fixed cross-match table.
Python OBV sidecar HTTP contract
The TypeScript bridge talks to the Python obv-engine over four HTTP endpoints, base URL OBV_ENGINE_BASE_URL (default http://127.0.0.1:8100):
| Endpoint | Purpose |
|---|---|
POST /api/live-obv/matches/{matchId}/start | Register a kicked-off match and lock in the rosters |
POST /api/live-obv/matches/{matchId}/events | Push normalized live events for scoring |
POST /api/live-obv/matches/{matchId}/end | Mark match complete; close out per-match state |
GET /api/live-obv/matches/{matchId}/snapshot | Read authoritative team & player totals |
The bridge is gated by ENABLE_REALTIME_OBV=true. When disabled (or when the sidecar is unreachable), the live processor falls back to a heuristic impact-estimation path so live tracking degrades rather than failing.
Key design decisions:
- Pinned z-score population. During a live match, the mean and stdev used for z-scoring are frozen at the last completed batch cycle’s values. Only teams and players actually on the pitch move; everyone else’s index is mathematically invariant. This prevents a single live match from re-anchoring the entire league’s pricing.
- Tiered fallback. If the Python OBV sidecar (
obv-engineon the production host, port 8100) is unreachable, the engine degrades gracefully: authoritative per-event OBV → aggregated season rate → heuristic from shot/goal events. Each tier is clearly labeled in the data stream so consumers know the quality of what they’re pricing against. - Tighter change thresholds during live matches. The oracle push threshold tightens during live play so in-play price movement reaches the chain promptly.
See the Real-Time vs Post-Match page for how the live estimate reconciles with the official post-match OBV.
The raw fields we consume
The engine’s index calculation reads the following fields per entity. These are the partner’s canonical REST field names, preserved verbatim for auditability.
Team season stats:
team_season_obv_pg— aggregate OBV per match (primary signal)team_season_obv_pass_pg,team_season_obv_shot_pg,team_season_obv_defensive_action_pg,team_season_obv_dribble_carry_pg,team_season_obv_gk_pg— per-category breakdowns (surfaced to traders, not currently weighted into the composite)team_season_matches,team_season_gd,team_season_xgd,team_season_goals_pg,team_season_goals_conceded_pg
Player season stats:
player_season_obv_90— OBV per 90 (primary signal)player_season_obv_pass_90,player_season_obv_shot_90,player_season_obv_defensive_action_90,player_season_obv_dribble_carry_90,player_season_obv_gk_90player_season_minutes,primary_positionplayer_season_goals_90,player_season_assists_90,player_season_np_xg_90,player_season_xa_90,player_season_tackles_90,player_season_interceptions_90,player_season_aerial_wins_90(feed into position-specific form)
The engine preserves the partner’s native field names end-to-end. This means any claim SportsPerp makes about a market’s price can be audit-traced back to a specific set of source fields from a specific data version, without translation.
What traders see downstream
Once the index value is computed, it surfaces in three places:
- On-chain oracle. A 10^6-scaled fixed-point price, updated via the
update_oracleinstruction. This is what positions are marked against. - REST candle API.
GET /api/candles/{marketKey}?timeframe=1m|1H|4H|1Dreturns OHLC bars from the SQLite candle store. - WebSocket feed.
wss://…/wsstreams real-time ticks, candle updates, and (in live matches) per-event overlay deltas so charts can render live.
All three are kept consistent: the oracle push, the candle write, and the WS broadcast are triggered by the same calculation step within a crank cycle. A trader cannot see a WebSocket tick that disagrees with the oracle, by construction.
Credentials and environment
Data-partner credentials are read from environment variables — never hardcoded — and live on the production server in a systemd EnvironmentFile= excluded from the repository via .gitignore.
Further reading
- Real-Time vs Post-Match — how the live path reconciles with the official OBV at full-time.
- Oracle Design — the on-chain side: confidence, staleness, TWAP sampling.