Is Xai lied about Grok 3 criteria?
Discussions on artificial intelligence standards – and how they are reported by AI Labs – leaks into public opinion.
This week, Openai employee accused Elon Musk’s Ai, Xai, to publish the scriptable results of the latest AI, Grok 3. One of the founders participating in Xai, Igor Babushkin, Insist The company was on the right.
The truth is somewhere between them.
in Post on the Xai BlogThe company published a graphic fee showing Grok 3 performance in AIME 2025, a group of difficult mathematics questions from the newly invited mathematics exam. Some experts have Doubted the health of AIME as the standard of Amnesty International. However, the AIME 2025 or older versions of the test are used commonly to investigate mathematics in the form.
Xai’s chart showed two types of GROK 3, Grok 3 Beta Beta and Grok 3 Mini Reasoning, overcoming the best performance available in Openai, O3-Mini HighIn Aime 2025. But Openai’s employees on X were quick to point out that the XAI chart did not include AIME 2025 from O3-Mini-Height in “Cons@64”.
What are the negatives@64, you may ask? Well, it is short for “consensus@64”, and mainly gives model 64 trying to answer each problem in a standard and take the answers that have been created repeatedly as final answers. You can also imagine, CONS@64 tends to increase the standard degrees of models slightly, and delete them from the graph may make them look as if one of the models exceeds another model when this is in reality.
GROK 3 Reasying Beta and GROK 3 Mini Reasoning Scores for Aime 2025 in “@1”-which means that the first result that models got in the standard-quoted from high O3-MINI degree. Grok 3 Beta Beta also extends greatly behind Openai’s O1 Model Set on “medium” computing. After xi is Grok 3 ad. As “the smartest artificial intelligence in the world”.
Babushkin Get on x Openai has published the similar standard plans in the past – although plans compare the performance of their own models. Putting a more neutral party in the discussion is a more “accuracy” graphic fee that shows almost each model’s performance in CONS@64:
Farhan how some people see a conspiracy as an attack on Openai and others as an attack on Grok while in reality it is Deepseek’s propaganda
(I actually think Grok looks good there, and TTC Chicainry is worth the OPENAI behind O3-MINI-*Alia*”1″ “” more scrutiny.) https://t.co/djqljpcjh8 pic.twitter.com/3wh8foufic– Teortaxes ▶ ️ (Deepseek 推特🐋铁粉 2023 – ∞) (Teortaxestex) February 20, 2025
But as an artificial intelligence researcher, Nathan Lambert ReferPerhaps the most important level is still a mystery: the calculation (and critical) cost that it took for each model to achieve the best degree. This only shows that most of the artificial intelligence standards have not known the restrictions of models – and its power points.