How IPL Arena Works

5 frontier LLMs. $100,000 paper bankroll each. 11 fires per match. Live Stake odds in, Cricinfo settlement out, scored on realized P&L. Here's the full pipeline — top to bottom.

01 · THE QUESTION

Can frontier AI beat the bookies?

Frontier LLMs are powerful pattern spotters. Whether they actually have edge in live sports markets — where prices move in milliseconds, information is partial, and sharp adversaries are on the other side — is an empirical question. This site exists to answer it with paper money so the answer is real but no one gets hurt.

We pick the IPL because the season is dense (10 teams, 74 matches), T20 betting markets are deep, and the scoring data is well-documented. Cricket also moves fast enough that a model has to reason in seconds, not days — exactly the regime where shallow reasoning fails.

02 · THE FIELD

Five frontier models, one per lab

One reasoner per major AI lab. Identical prompts, identical odds, isolated bankrolls.

Claude Opus 4.7 (Anthropic) — flagship reasoning + extended thinking.
GPT-5.5 (OpenAI) — frontier successor in the GPT-5 family.
Gemini 3.1 Pro (Google DeepMind) — top-tier multimodal Pro tier.
Grok 4.20 (xAI) — real-time-data-tuned frontier model.
MiniMax M2.7 (MiniMax) — long-context reasoning model.

Every model gets the exact same system prompt, the exact same tool harness (web search, calculator, place_bet, no_bet), and the exact same odds snapshot at the exact same moment. The only variable is the LLM itself.

03 · THE CADENCE

11 fires per match — when the model gets to bet

1 post-toss · 5 across innings one · 5 across innings two.

Every match gives each model 11 chances to act. One right after the toss. Five spread across the first innings. Five spread across the second. At every one of those moments the model receives a fresh odds snapshot, the live match state, recent ball-by-ball context, and the bankroll it has left to spend. It decides whether to place one bet, several bets, or none at all.

Sitting out is a real option. If nothing on the board is worth the risk — odds are too short, the line moved against the model mid-innings, no edge identified — the model can pass, and the pass is logged in the activity feed alongside its reasoning. Every tool the model used and every line of its trace is recorded, so any call can be reviewed later.

Bets are sized inside the bankroll the model still has, so a heavy first match leaves less room for the next. One $100,000 bankroll spans the entire season, not per match. Staking caps: 0.5% – 5% per bet, 25% daily exposure, 10% per match, 15% correlated (same team / day). A −25% drawdown triggers a 48h cooldown; −50% halts the season for that model.

04 · THE DATA

What the model sees at each fire

A model is only as good as its inputs. At every fire it has access to:

Live betting markets from Stake.ac via a residential ScrapFly Cloud Browser, sub-500ms refresh. Markets include moneyline, team totals, player runs, and totals on sixes, fours, and wickets.
Live match state — toss outcome, score, ball by ball when available, recent batting and bowling context, required run rate, wickets in hand.
Match context — starting eleven, recent head to head record, venue and pitch notes.
Web research tools — the agent can run Tavily-backed web searches and Python-evaluator calculations to back up its reasoning before locking in a bet.

All of this comes from commercial third-party data feeds and a commercial AI gateway (OpenRouter, with prompt caching enabled on Anthropic / Google / DeepSeek-style providers) for inference. Visitor data is never sent to either.

05 · THE SCORE

How models are ranked

Three signals, primary to tie-breaker:

Realized P&L. The dollar net result of every settled bet on the $100,000 paper bankroll across the full season. Open bets don't count until they close.
Win rate. The fraction of settled bets that landed. Voids are excluded so a missing player or rain-out doesn't flatter or punish the model.
Composite skill score. A volume-normalized aggregate of P&L and win rate so a model that placed 50 winning bets ranks differently from one that hit a single lucky longshot.

Why P&L over a fancy line-value metric?
Because real money cares about real money. A paper $100K bankroll tells the same story a sportsbook would: did the model finish the season ahead, behind, or somewhere in between? It's readable without a glossary.

06 · SETTLEMENT

How a bet resolves

Two passes. The live pass resolves bets the moment the outcome is known — if a player prop says “Kohli over 39.5 runs” and Kohli is out for 39, the model takes the loss the moment that wicket falls, not at the end of the match. Live settlement runs off Stake's market resolution.

The cleanup pass runs after the final ball. The settler walks Cricinfo's canonical ball-by-ball record: every team total, every batter's runs, every six, every four, every wicket. Within a few hours of the final ball every bet is reconciled against ground truth. Discrepancies between live Stake resolution and Cricinfo are flagged and audited.

Edge cases handled per ICC rules:

Super-over tie → bet PUSHED.
D/L (Duckworth-Lewis) rain ruling → totals markets VOIDED.
Match abandoned → all bets on that match VOIDED.
Player did not feature (injury, late XI change) → player props PUSHED.

07 · THE LEADERBOARD

What the columns mean

Open the leaderboard. Each row is one model for one season. Four numbers tell the whole story.

P&L. Total dollars won or lost on settled bets. The headline figure and the primary sort.
Return %. P&L divided by the $100,000 starting bankroll — same number framed as a percentage so heavy and light bettors are comparable.
Win rate. Fraction of settled bets that landed. Voids excluded.
Bets. Total wager count for the season. The bigger this number, the more reliable the other three become.

Click any model row to see every bet it placed, every prompt it saw, and every line of reasoning behind the call. Click any match to see the decision timeline across all five contestants. Click any single decision to see the model's tool calls in full. Compare two models head-to-head at /compare.

08 · IMPORTANT

This is a benchmark, not betting advice

IPL Arena is a research benchmark. Models are evaluated under controlled conditions and may make wagers that real bettors should not. Please don't copy these bets. Refer to responsible gaming resources if real-money sports betting is a risk for you. We point at them because readers might extrapolate to real betting elsewhere — please don't.