AI models split on 2026 World Cup winner, Spain gets 4 votes and Argentina 3

After seven artificial intelligence (AI) models were asked to predict the winner of the 2026 World Cup, the answers split between Spain and Argentina.

A blockchain media outlet, Decrypt, reported on June 8 (local time) that in the experiment, four of the seven AI models picked Spain and three chose Argentina as the title favourite. All models put Spain, Argentina and France in the top tier, but they differed on the eventual champion.

The experiment included Anthropic's Opus 4.8 Max, OpenAI's GPT-5.5, DeepSeek v4 Pro, StepFun 3.7, Nvidia's Nemotron 3 Ultra, MiniMax 2.7 and Qwen 3.5. Each model was given the same information on the 48 teams, 12 groups and the full tournament bracket, while the prediction method was left to each model.

The models that picked Spain were Opus 4.8 Max, GPT-5.5, StepFun 3.7 and Nemotron 3 Ultra. StepFun 3.7 put Spain's title chances at 33 percent based on 50,000 simulations. Opus 4.8 Max forecast a final in which Spain beat France, while GPT-5.5 pointed to Spain after weighing squad strength and tactics, finishing ability, available players and the bracket.

The models that saw Argentina as the title favourite were DeepSeek v4 Pro, MiniMax 2.7 and Qwen 3.5. DeepSeek v4 Pro, based on qualitative analysis, predicted a final between Argentina and France. MiniMax 2.7 suggested the possibility of an Argentina-France final but did not specify an ultimate winner. Qwen 3.5 separated facts, estimates and forecasts and ranked Argentina as the most likely candidate.

The key reason for the split lay less in team strength itself than in data selection. Models that put weight on Spain being No. 1 in live football Elo ratings picked Spain. By contrast, models that prioritised FIFA rankings and 2022 World Cup results leaned toward Argentina.

Approaches also varied by model. Opus 4.8 Max used the Dixon-Coles model and Monte Carlo simulation, and also reflected host-environment variables such as heat, high altitude and long-distance travel. GPT-5.5 presented winning probabilities as ranges, while StepFun 3.7 repeatedly ran Elo-based simulations and rated Spain's title chances the highest.

Some models also showed limitations. DeepSeek v4 Pro had some coaching information and ranking data that were outdated, and Qwen 3.5 showed errors in the group draw. Even with the same bracket, the choice of sources and how they are verified can change not only outcomes but also the risk of errors.

The flow of prediction markets was similar to the majority AI view. On Myriad, a prediction market operated by Dastan, Spain ranked first at 19 percent as of June 7, with France second at 17 percent. Argentina was assessed at 10 percent, a lower probability than in some AI forecasts.

The experiment shows that AI predictions may not produce a single answer and can reach different conclusions depending on the data used and evaluation criteria. All seven models rated Spain, Argentina and France as strong teams, but their views on the champion split depending on what they emphasised among Elo ratings, FIFA rankings, past performance, bracket luck and environmental variables.

As a result, the competitiveness of AI sports prediction is unlikely to stop at simply calling the champion. Which evidence supports a conclusion, how much data error is reduced and how transparently uncertainty is presented are expected to become standards that determine the credibility of future AI forecasting models.

Yoonseo Lee yslee@d-today.co.kr

Keyword