SUPER Mario is used to measure artificial intelligence now
belief Pokémon was a difficult standard for Amnesty International? A group of researchers argues that Super Mario Bros. More strict.
Hao Ai Lab, a research at the University of California San Diego, on Friday of artificial intelligence at Super Mario BROS. Antarbur Claude 3.7 The best performance, followed by Claude 3.5. Google Gemini 1.5 Pro And Openai’s GPT-4O Struggle.
The same version of Super Mario Bros. was not. As the original 1985 version, to be clear. The game ran into an emulator and merged with a framework, GamingagentTo give AIS control over Mario.

Gamingagent, who has developed Hao inside the company, feeds the basic instructions of Amnesty International, such as “if it is an obstacle or an enemy, move/left left to avoid” and shots in the game. Artificial intelligence then created inputs in the form of the Peton icon to control Mario.
However, Hao says the game forced each model to “learn” to plan complex maneuver and develop play strategies. Interestingly, the laboratory found that thinking models like Openai’s O1Which “thinking” through step -by -step problems to reach solutions, was worse than “non -metal” models, although they are generally stronger in most of the criteria.
One of the main reasons for thinking models faces a problem in playing games in actual time like this is that it takes some time-second, usually-with a decision on procedures, according to researchers. In Super Mario Bros. , Timing is everything. The second can mean the difference between the jump that has been safely cleared and retreated until your death.
Games were used to measure artificial intelligence for decades. but Some experts have asked about wisdom Drawing communications between gaming skills in artificial intelligence and technological progress. Unlike the real world, games tend to be abstract and relatively simple, and provide an unlimited amount of artificial intelligence.
The criteria of the last delightful games refer to the so -called Andrig Carbashi, a research scientist and a founding member in Openai, the “evaluation crisis”.
“I don’t really know what [AI] He wrote in A “Standards to look at now.” After x. “TLDR reactions is that I don’t really know how much these models are now.”
At least we can see Amnesty International plays Mario.