Home » At Google I/O, AI that never hallucinates mistakes

At Google I/O, AI that never hallucinates mistakes

by Anna Avery
0 comments


This year, Google I/O 2025 had one focus: Artificial intelligence.

We’ve already covered all of the biggest news to come out of the annual developers conference: a new AI video generation tool called Flow. A $250 AI Ultra subscription plan. Tons of new changes to Gemini. A virtual shopping try-on feature. And critically, the launch of the search tool AI Mode to all users in the United States.

Yet over nearly two hours of Google leaders talking about AI, one word we didn’t hear was “hallucination”.

Hallucinations remain one of the most stubborn and concerning problems with AI models. The term refers to invented facts and inaccuracies that large-language models “hallucinate” in their replies. And according to the big AI brands’ own metrics, hallucinations are getting worse — with some models hallucinating more than 40 percent of the time.

But if you were watching Google I/O 2025, you wouldn’t know this problem existed. You’d think models like Gemini never hallucinate; you would certainly be surprised to see the warning appended to every Google AI Overview. (“AI responses may include mistakes”.)

Mashable Light Speed

The closest Google came to acknowledging the hallucination problem came during a segment of the presentation on AI Mode and Gemini’s Deep Search capabilities. The model would check its own work before delivering an answer, we were told — but without more detail on this process, it sounds more like the blind leading the blind than genuine fact-checking.

For AI skeptics, the degree of confidence Silicon Valley has in these tools seems divorced from actual results. Real users notice when AI tools fail at simple tasks like counting, spellchecking, or answering questions like “Will water freeze at 27 degrees Fahrenheit?

Google was eager to remind viewers that its newest AI model, Gemini 2.5 Pro, sits atop many AI leaderboards. But when it comes to truthfulness and the ability to answer simple questions, AI chatbots are graded on a curve.

Gemini 2.5 Pro is Google’s most intelligent AI model (according to Google), yet it scores just a 52.9 percent on the Functionality SimpleQA benchmarking test. According to an OpenAI research paper, the SimpleQA test is “a benchmark that evaluates the ability of language models to answer short, fact-seeking questions.” (Emphasis ours.)

A Google representative declined to discuss the SimpleQA benchmark, or hallucinations in general — but did point us to Google’s official Explainer on AI Mode and AI Overviews. Here’s what it has to say:

[AI Mode] uses a large language model to help answer queries and it is possible that, in rare cases, it may sometimes confidently present information that is inaccurate, which is commonly known as ‘hallucination.’ As with AI Overviews, in some cases this experiment may misinterpret web content or miss context, as can happen with any automated system in Search…

We’re also using novel approaches with the model’s reasoning capabilities to improve factuality. For example, in collaboration with Google DeepMind research teams, we use agentic reinforcement learning (RL) in our custom training to reward the model to generate statements it knows are more likely to be accurate (not hallucinated) and also backed up by inputs.

Is Google wrong to be optimistic? Hallucinations may yet prove to be a solvable problem, after all. But it seems increasingly clear from the research that hallucinations from LLMs are not a solvable problem right now.

That hasn’t stopped companies like Google and OpenAI from sprinting ahead into the era of AI Search — and that’s likely to be an error-filled era, unless we’re the ones hallucinating.



Source link

You may also like

Editor Pics

Latest News

© 2025 blockchainsphere.info. All rights reserved.