New artificial intelligence models from Openai more hallucinations
Openai’s O3 and O4-MINI AI models have been launched recently It is modern in many respects. However, the new models are still hallucinations, or make things – in fact, they are hallucinations more Of many old models in Openai.
Halosa has proven to be one of the biggest and most difficult problems that must be solved in artificial intelligence, which affects Even the best performance systems today. Historically, every new model has improved slightly in the hallucinations department, which is a hallucinated less than its predecessor. But this does not seem to be the case for O3 and O4-MINI.
According to the internal tests of Openai, O3 and O4-MINI, which are alleged thinking models, hallucinations Often From the previous thinking models of the company-O1, O1-MINI and O3-MINI-as well as the “non-escalating”, “non-escalating” models, such as GPT-4O.
Perhaps more about, Chatgpt maker does not really know why it occurs.
In its technical report to O3 and O4-MiniOpenai writes that “more research is needed” to understand a reason for hallucinations because it puts thinking models. O3 and O4-MINI performance in some areas, including tasks related to coding and mathematics. But since they “make more claims in general”, it often leads to “more accurate claims in addition to inaccurate/concrete claims, according to the report.
Openai found that o3 hallucinations in response to 33 % of questions on PersonQa, the company’s inner standard to measure the accuracy of the form of the model about people. This is almost twice the hallucinations for previous thinking models in Openai, O1 and O3-MINI, which recorded 16 % and 14.8 %, respectively. O4-MINI was not worse on Personqa-Hellus 48 % of time.
A third party Test By translating non -profit artificial intelligence research laboratory, he also found evidence that O3 has a tendency to compensate for the measures it has taken in the process of accessing answers. In one examples, the optimized O3 translation claiming to run a code on MacBook Pro 2021 “outside Chatgpt”, then copy the numbers in his answer. Although O3 has access to some tools, it cannot do this.
“Our hypothesis is that the type of augmented learning used for chain models O may lead to an enlarged problems that are usually diluted (but not fully erased) by post -training pipelines,” said Neil Chaudhr, a researcher in Openai’s translation and employee, in an email to Techcrunch.
Sarah Shoymann, co -founder of translation, added that the hallucinogenic rate in O3 may make him less useful than it will be.
Kayan Katanforoch, assistant professor at Stanford University and CEO of Stariving Startup Workra, told TECHRUNCH that his team is already testing O3 in their coding workflow, and that they found a step over the competition. However, KatanforoSh says that O3 tends to fractured web links. The model will provide a link, when clicking, does not work.
Hallucinations may help models to reach interesting ideas and be creative in their “thinking”, but they also make some models a difficult sale of companies in the markets where accuracy is of utmost importance. For example, a law firm is likely to be pleased with a model that includes many realistic errors in customer contracts.
One of the promising approaches to enhance the accuracy of the models is to give them the possibilities of searching on the web. Openai’s GPT-4O with a web search achieves 90 % accuracy On Simpleqa, another one of Openai accuracy. It is possible that the research can improve hallucinations in thinking models, at least in cases where users are ready to expose demands for a third -party search supply.
If the thinking models continue to exacerbate hallucinations, this will make this search for a more urgent solution.
“Treating hallucinations in all our models is an ongoing field of research, and we are constantly working to improve its accuracy and reliability,” Openai spokesman Niko Felix said in an email to Techcrunch.
Last year, the broader artificial intelligence industry was pivotal to focus on thinking models yet Traditional artificial intelligence models began to show decreasing returns. Thinking improves the performance of the model in a variety of tasks without the need for huge amounts of computing and data during training. However, logic may also lead to more hallucinations – a challenge.