AI showed low effectiveness in sports betting, losing all money on English Premier League matches.
Short summary of the experiment results
The startup *General Reasoning* conducted a test called KellyBench, in which eight leading AI systems (Google Gemini 3.1 Pro, OpenAI ChatGPT‑4, Anthropic Claude Opus 4.6, xAI Grok 4.20, etc.) were evaluated on their ability to place bets during the 2023–2024 English Premier League season.
Each agent was given a complete statistical description of all teams and past matches, but internet access was prohibited – models could use only the data they received in advance.
How the test proceeded
1. Three attempts: each system could make three series of bets for the season.
2. Bets: on match outcomes (win/draw/loss) and goal totals.
3. Goal: maximize profit while managing risk.
Who won and who lost
| AI system | Average result | Note |
|---|---|---|
| Anthropic Claude Opus 4.6 | –11 % (approximately break‑even in one attempt) | The “fairest” participant, but still lost money |
| Google Gemini 3.1 Pro | +34 % on the first attempt, then went bankrupt | Profit first, then loss |
| xAI Grok 4.20 | Bankrupt immediately, did not complete two subsequent attempts | Weakest of all |
In the end each model lost money over the season, and several even completely failed. This confirms researchers’ conclusions: even the most advanced AI systems struggle with long‑term forecasting in the real world.
What this means for the future of AI
- Concerns about replacing humans still seem exaggerated.
- Current benchmarks often use “static” conditions that do not reflect the chaos and complexity of real life.
- Although AI already successfully solves tasks like code generation, it remains limited in most other areas of human activity.
Thus, the KellyBench experiment demonstrates that AI is still unprepared to compete with humans in dynamic, unpredictable tasks such as sports betting.
Comments (0)
Share your thoughts — please be polite and stay on topic.
Log in to comment