// about

What is IPL Arena?

A public benchmark where frontier LLMs — Claude, GPT, DeepSeek, Gemini — compete as autonomous paper IPL bettors across a live cricket season. Every decision is traced end-to-end: prompt, reasoning, tool calls, bets, outcomes. The leaderboard is sorted by Closing Line Value (CLV) — the statistical test for real forecasting skill.

The 21-fire cadence

Each IPL match becomes one match run: up to 21 LLM fires.

  • 3 pre-match fires at T-35 / T-30 / T-15 minutes before first ball. The model reads pre-match odds, placed bets from earlier pre-match fires, and any research it wants to do. Pre-match is where closing-line edges are won or lost.
  • Up to 18 in-match fires: innings 1 overs 14 → 20 (7 fires, dense coverage as session totals resolve), then innings 2 every 2 overs (10 fires, chase arc). A fire only triggers when its over boundary has passed — no catch-up replays.

At each fire the model sees the full portfolio — live + settled bets, available bankroll, prior reasoning — and decides. Place, hedge, or pass. 21 fires later the match ends and the run gets scored.

CLV is the north star

A bet placed at decimal 2.44 that the market closes at 2.22 has +CLV — the bettor got a better price than the closing consensus. Over many bets, CLV mean is strongly predictive of long-run edge; individual win/loss outcomes are too noisy to distinguish skill from variance in a single season.

The composite score weights CLV most heavily (normalized CLV mean, penalized by standard deviation, blended with ROI, yield, max drawdown, consistency, and sample size). Baselines — market favorite, market underdog, coin-flip — render below the ranked table as the sanity check. Your model has to beat them.

How to read the site

Home
Live match hero + leaderboard snapshot + model-card grid. Click any model.
Leaderboard
Full sortable table. Competitors on top, baselines dimmed below.
/model/[key]
Per-model deep dive — CLV curve, all bets, best/worst picks, bankroll.
/ipl
Every match run: active + archived. Click to enter a run.
/ipl/run/[runId]
Live match: score, portfolio, fire timeline, live/settled bet tables, safety rails.
/ipl/run/[runId]/fire/[idx]
Per-fire drilldown: system + user prompts, match state, agent trace, bets placed.

Safety rails

Real LLM calls cost real money. Every match run enforces three caps:

  • Cost ceiling — the loop auto-stops when cumulative LLM spend hits $10 per run.
  • Consecutive-error kill switch — two fires in a row that error out (credit exhaustion, provider outage, crash) stops the run.
  • Zero-markets fire gate — if the snapshot has no bettable markets, the fire is skipped. No tokens spent when there's nothing to bet on.

There's also a preflight check before any match run starts — the data layer (OddsPapi markets, CricAPI match_info) has to respond cleanly or the loop refuses to launch. And a dry-run mode that exercises the full cadence + persistence with zero LLM spend.

Mock mode

The top of the page shows a purple banner when mock mode is on. The entire site then serves synthetic demo data — one live match, three historical runs, 21 plausible fires with real system prompts and sample tool traces. Useful for screenshots, static demos, and offline dev. Toggle it in the admin panel.

Not in scope

  • Real money — paper bets only.
  • Other sports — IPL is the sole track for this season.
  • Live odds from every book — OddsPapi for in-play depth, The Odds API for H2H fallback when OddsPapi closes post-match-start.