The AGI test is a new and a challenge scene that finds most of the artificial intelligence models

The Arc Award Foundation, a non -profit institution, has been announced in its founding by prominent AI researcher Francois Chollet, in A. Blog post On Monday, she created a new and full test to measure the general intelligence of Amnesty International models.

To date, the new test, called ARC-AGI-2, assembled most of the models.

Thinking models such as Openai’s O1-PRO and Deepseek R1 between 1 % and 1.3 % on ARC-AGI-2, according to Sagon to prize for leaders. Strong non-glamorous models including GPT-4.5, Claude 3.7 Sonnet and Gemini 2.0 Flash about 1 %.

The ARC-EAGI tests consist of a puzzle-like problems where artificial intelligence must determine the visual patterns of a group of squares of different colors, and to create the correct “answer” network. Problems are designed to force artificial intelligence to adapt to new problems that they have not seen before.

The Arc Award Foundation had more than 400 people taking ARC-AGI-2 to create a human foundation. On average, “paintings” of these people got 60 % of the test questions properly – much better than any of the models.

Question of ARC-Agi-2 (Credit: ARC Prize).

in After xThe Chollet Arc-AGI-2 claimed a better measure of the actual intelligence of the artificial intelligence model from the first repetition of the test, ARC-AGI-1. The ARC Arc Foundation’s tests aim to evaluate whether the artificial intelligence system can obtain new skills outside the data efficiently.

Chollet said that unlike ARC-AGI-1, the new test prevents artificial intelligence models from relying on “brute force”-the broad computing power-to find solutions. Coult admitted in advance This was a major defect in ARC-AGI-1.

To address the first test defects, the Arc-AGI-2 introduces a new scale: efficiency. It also requires models to explain patterns while flying instead of relying on memorization.

“Intelligence is not defined only by the ability to solve problems or achieve high degrees.” Blog post. “The efficiency of these capabilities is obtained and published is a decisive and specific component. The main question that it asks not only,” Can Amnesty International be obtained [the] Skill to solve the task? But also, “In what efficiency or cost?”

ARC-AGI-1 was not defeated for about five years until December 2024, when Openai was released to it Advanced thinking model, O3And that surpassed all other artificial intelligence models and human performance that is identical to the evaluation. However, as we noticed at that time, O3 performance gains on ARC-AGI-1 came at a huge price.

The version of the O3 O3-O3 model (low)-was one of the first to reach new heights on the ARC-AGI-1, with 75.7 % in the test, by 4 % on the ARC-AGI-2 using a $ 200 computing capacity per task.

Comparing the performance of the Ai Frontier model on ARC-AGI-1 and ARC-AGI-2 (credit: ARC Prize).

The arrival of ARC-AGI-2 also calls many in the technology industry to new unanimous standards to measure the progress of artificial intelligence. Thomas Wolf, co -founder of Huging Face, told Techcrunch that The artificial intelligence industry lacks sufficient tests to measure the main features of the so -called artificial general intelligenceIncluding creativity.

Besides the new standard, the ARC Award Foundation announced New Arc Award for 2025The developers challenge to reach 85 % in the Arc-AGI-2 test while spending only $ 0.42 per task.

The AGI test is a new and a challenge scene that finds most of the artificial intelligence models

Leave a Comment Cancel Reply

Sign up for our newsletters

Must Read

Leave a Comment Cancel Reply

Sign up for our newsletters