Google’s AI search overviews powered by the company’s Gemini large-language model (LLM) are alarmingly inaccurate, according to a new report.

The report, conducted by AI startup Oumi and commissioned by the New York Times, found that 91 percent of searches are accurate.

However, given that Google processes more than five trillion searches per year, the inaccuracies add up to tens of millions of wrong answers and hundreds of thousands every minute.

As Futurism noted, that much incorrect information at one time could be considered a misinformation crisis.

‘Serious holes’

Google, however, contested the findings, with the company’s spokesperson Ned Adriance telling Newsweek, “This study has serious holes.”

He pointed out that the New York Times’ study used one AI to grade another, calling the method “an old benchmark that is known for being full of errors.”

Additionally, he said the method “doesn’t reflect what people are actually searching on Google.”

The Method

The researchers used a system called SimpleQA, a benchmark created by OpenAI that evaluates how well an LLM can answer short, fact-seeking questions.

According to OpenAI, SimpleQA is accurate, but its scope is limited—it can only measure short, fact-seeking questions with a single, verifiably correct answer.

“Whether the ability to provide factual short answers correlates with the ability to write lengthy responses filled with numerous facts remains an open research question,” the article notes.

The Problem

However, Oumi’s evaluation of Google showed that even questions that can be verifiably proven correct sometimes slip through Google’s AI overview—the report cited several factual examples that were undeniably incorrect.

When the AI got things wrong, the incorrect answer could be traced to a variety of issues.

Sometimes, the AI cited a website that couldn’t back up the information. Other times, the overview cited a website with the correct information, but it got the information wrong.

In some cases, the overview got the answer correct, but then proceeded to provide additional context that was wrong.

Finally, the report said, the AI was vulnerable to manipulation—a blog post was enough in some cases to trick the AI into thinking a person was an expert in a random field.

Incorrect ‘Ground Truths’

Google, however, said SimpleQA has issues, citing a study conducted by several Google DeepMind researchers.

The researchers found that SimpleQA had several incorrect “ground truths”—a term that references facts that are either verified by humans or evidence-based.

Google also noted that Oumi used an AI model as an evaluator for Gemini—in other words, assessing the accuracy of an imperfect AI model with another imperfect AI model.

Google’s Challenges

Finally, Google drew attention to two examples cited by the New York Times.

In the first, Gemini claimed Bob Marley’s house was converted into a museum in 1987, even though the right answer is 1986.

According to Google, the Wikipedia article Gemini drew from had two different dates listed—one in 1986 and one in 1987—and Google provided a screenshot, although the Wikipedia article is now consistent in saying “1986.”

Secondly, Google contested the New York Times’ assertion that Gemini mixed up the location of the Neuse River in North Carolina, saying that it ran “west” of the city of Goldsboro.

The Neuse River runs primarily south of Goldsboro, but it does run southwest of the city, which Google said made the answer “plausible.”

Newsweek has reached out to the New York Times for comment.

Read the full article here

Share.
Leave A Reply

2026 © Prices.com LLC. All Rights Reserved.
Exit mobile version