News

The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems. Find out how OpenAI’s o3 and ...
Comparing AI reasoning abilities reveals OpenAI's o1 model surpasses DeepSeek's R1 in generating accurate, sentence-level ...
OpenAI's o3 model might be costlier to run than originally estimated, according to a third-party benchmarking org.
However, according to OpenAI’s internal tests, these new o3 and o4-mini reasoning models also hallucinate significantly more ...
Metr, a frequent OpenAI partner, suggested in a blog post that it wasn't given much time to evaluate the company's powerful ...
OpenAI's latest AI models tend to make things up — or "hallucinate" — substantially more than earlier versions.
Historically, each new generation of OpenAI's models has delivered incremental improvements in factual accuracy, with ...
The Chinese AI company said its latest model demonstrated “significant improvements” in benchmark performance.
On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities ...
OpenAI has finally released the full o3 reasoning model along with o4-mini. New models can use multiple tools inside ChatGPT ...
OpenAI's newly launched o3 and o4-mini AI models, despite their advanced features, are exhibiting increased rates of ...