The OpenAI agent tool may be about to be released
OpenAI may be about to release an AI tool that can control your computer and perform actions on your behalf.
Tibor Blaho, a software engineer known for accurately leaking upcoming AI products. Claims To reveal evidence of the long-rumored OpenAI Operator tool. Publications Included Bloomberg previously I mentioned on the operator, which is said to be “agentA system capable of independently handling tasks such as writing code and booking travel.
According to According to the information, OpenAI is targeting January as the launcher release month. The code revealed by Blaho this weekend adds credence to those reports.
OpenAI ChatGPT The macOS client has gained options, currently hidden, to define “Toggle Operator” and “Force Quit Operator” shortcuts for each Blaho. OpenAI has added references to Operator on its website — though the references are not yet visible to the public, Blaho said.
Confirmed – ChatGPT macOS desktop app has hidden options to define shortcuts to the desktop launcher for “Toggle Operator” and “Force Quit Operator” https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS
– Tibor Blaho (@btibor91) January 19, 2025
According to Blaho, the OpenAI website also contains not-yet-public tables that compare the operator’s performance to other AI systems that use computers. Tables may be placeholders. But if the numbers are accurate, they indicate that the agent is not 100% reliable, depending on the task.
The OpenAI website already contains references to Operator/OpenAI CUA (Computer Usage Agent) – “Operator System Card Table”, “Operator Research Evaluation Table”, and “Operator Rejection Rate Table”
Including comparison with Claude 3.5 Sonnet for PC, Google Mariner, etc.
(View tables… pic.twitter.com/OOBgC3ddkU
– Tibor Blaho (@btibor91) January 20, 2025
On OSWorld, a benchmark that attempts to emulate a real computer environment, “OpenAI Computer Use Agent (CUA)” — perhaps the AI model running the player — scored 38.1%, ahead of Anthropic Computer control model But it is much lower than the 72.4% recorded by humans. OpenAI CUA outperforms humans on WebVoyager, which evaluates AI’s ability to navigate and interact with websites. But the model falls short of human results in another web-based benchmark, WebArena, according to the leaked benchmarks.
The operator also has difficulty performing tasks that can easily be performed by a human, if the leak is to be believed. In the test that tasked the operator with registering with a cloud provider and launching a virtual machine, the operator was successful only 60% of the time. The operator was given the task of creating a Bitcoin wallet, and was successful in only 10% of cases.
OpenAI’s imminent entry into the AI agent space comes with competitors including the aforementioned Anthropic, GoogleOthers are making plays for the emerging sector. It may be artificial intelligence agents Risky and speculativebut the tech giants are already promoting them as The next big thing At Amnesty International. According to According to analytics firm Markets and Markets, the AI agent market could be worth $47.1 billion by 2030.
Agents today are fairly primitive. But some experts have raised concerns about their safety if the technology improves quickly.
One leaked chart shows the operator performing well on specific safety assessments, including tests that attempt to get the system to perform “illicit activities” and search for “sensitive personal data.” It is saidSafety testing is one of the reasons for the long development cycle of the actuator. In the last X mailWojciech Zaremba, co-founder of OpenAI, criticized Anthropic for launching an agent that he claims lacks safety mitigations.
“I can only imagine the negative reactions if OpenAI made a similar version,” Zaremba wrote.
It is worth noting that OpenAI was criticize By AI researchers, including former employees, for allegedly de-emphasizing safety work in favor of quickly producing technology.