Opinion: The Industry Measuring Accuracy Has Revealed Its Own Structural Weakness

The most consequential finding in the BBC–EBU study is not the headline figure that 45 per cent of AI-generated news answers contained significant flaws. It is the consistency of the failure. Four leading assistants, tested across fourteen languages and eighteen markets, misrepresented source material with a uniformity that suggests not malfunction but methodology. The tools now acting as de facto news intermediaries are still unable to perform the most basic editorial tasks: attributing information, preserving chronology, maintaining context.

The implications are unusually stark. These systems are no longer curiosities or niche utilities; they are becoming the front door to the news. According to the Reuters Institute, a rising share of younger audiences already treat AI assistants as a substitute for search, and by extension as a filter for journalism itself. The study shows just how ill-suited they remain to that role.

The deficiencies are not subtle. Nearly a third of the responses exhibited serious sourcing failures: missing citations, invented attributions, or the laundering of opinion as reported fact. One in five answers contained material inaccuracies, including details that never appeared in the underlying journalism. Gemini performed worst by a considerable margin, with significant issues in three-quarters of its tested outputs. The common thread was not ideological bias but editorial thinness - a reliance on probabilistic inference over textual fidelity.

Why this matters is also clear. Audiences assume that these tools are accurate. A third of UK adults say they trust AI-generated summaries; among under-35s, the figure rises to nearly half. That confidence sits atop a system that merges interpretations, elides caveats, and appends the names of reputable outlets to sentences those outlets did not publish. When errors inevitably surface, the reputational cost accrues not only to the technology firms but to the news organisations whose authority has been borrowed and misapplied.

The risk extends beyond factual slippage. The assistants’ outputs exhibit a deeper problem of epistemic opacity. Traditional news corrections, however belated, are traceable: they amend identifiable claims. AI distortions are harder to detect and harder still to attribute. They circulate in multiple versions across multiple platforms, each iteration slightly altered yet carrying the imprimatur of journalistic credibility. By the time discrepancies are noticed, the underlying misrepresentation has already hardened into perceived fact.

The EBU’s call for more assertive regulatory enforcement will be welcomed by public broadcasters, but it addresses only part of the challenge. The structural issue is editorial displacement. These systems increasingly mediate the audience’s relationship to journalism while bypassing the professional disciplines that give journalism its value: verification, sourcing, proportionality. They simplify in ways that distort and summarise in ways that mislead.

The study reads as a caution rather than an indictment. It shows that AI assistants can, in principle, improve - BBC-led testing earlier this year identified progress, albeit uneven. But the broader lesson is less reassuring. Trust in news, once eroded, is difficult to recover. If AI intermediaries continue to misstate, misattribute and misframe the work of newsrooms at this scale, the question will not be whether audiences turn away from the technology. It will be whether they turn away from the news itself.

Next
Next

FT corrects scope of online marketplace listings in France’s Shein investigation