KPMG, AI Hallucinations, and Citation Checks

In June 2026, KPMG quietly pulled a report from its website.

The report was titled Redefining Excellence in the Age of Agentic AI, published in October 2025. It explored how enterprises can leverage AI to drive excellence, and it cited case studies from UBS, the UK's National Health Service, Swiss Federal Railways, and Transport for London.

The problem: those case studies were disputed. UBS said the report's description of them was inaccurate or misleading. NHS, Swiss Federal Railways, and Transport for London also denied the examples attributed to them, according to the Financial Times. TechRadar reported GPTZero's finding that, of 45 citations, only 5 correctly pointed to real sources. The rest were nonexistent, distorted, or fabricated outright.

KPMG's statement said: "We require all staff to follow our responsible AI use guidelines, including human review to verify content and independent sources."

Then the report came down.

Who is KPMG?

Not a small firm.

KPMG is one of the Big Four accounting firms, with operations in more than 140 countries and hundreds of thousands of employees worldwide. Its core business includes audit and assurance: verifying that financial figures are accurate and that reports can be trusted.

And yet this firm published an AI report citing enterprise case studies that named organizations later disputed, pushed it through global distribution channels, and kept it live for months before anyone caught it.

If an organization built around verification can miss AI-generated citation failures, what makes you confident the AI-generated report you finished yesterday has accurate citations?

Why AI doesn't say "I don't know"

The KPMG incident is not only about negligence. Negligence is the symptom. The underlying mechanism is more interesting.

When you ask an AI model "how is UBS using AI?", the model's job is to produce a plausible-sounding answer. A true answer and a plausible answer can look very similar on the surface, but they are not the same objective.

Language models are very good at the happy path: fluent, structured, detail-rich responses. What they are less reliable at is stopping at the exact moment uncertainty begins.

That is partly a product of how models are trained and tuned. Human feedback can reward responses that feel helpful, complete, and confident. Cautious disclaimers may be rewarded in some contexts; direct refusals may be rewarded in others; but in ordinary research-style writing, a polished answer often feels more useful than a short admission of uncertainty.

This can produce a familiar tendency: the model keeps giving you the shape of an answer even when the evidence is thin.

Ask for a source, and it may assemble an author name, report title, institution, date, and URL-shaped reference. Each element looks plausible. Together, they form a citation that feels complete.

The model is not lying in the human sense. It does not know it is producing fiction. It is continuing a pattern: consulting reports cite enterprise case studies; enterprise AI case studies mention banks, hospitals, rail systems, and transport agencies; references have titles, dates, and links. The result can look flawless until someone checks the source.

What AI FactScan can catch

AI FactScan is a browser extension that works while you are still inside the AI interface.

Select the AI's response, trigger a scan, and it does three things:

Extracts cited links, including links that look like citations but lead nowhere useful.
Grades each citation A through F based on source-quality signals, resolver checks, and known source patterns.
Flags AI self-citation, invalid resolver links, and source patterns that are easy to miss by eye.

In a case like the KPMG report, AI FactScan could have surfaced weak, invalid, or suspicious citation signals earlier. That is the moment to stop, open the sources, and investigate before a draft becomes a report.

What AI FactScan can't do

AI FactScan can tell you whether a link exists, whether a resolver identifier works, and whether a source has stronger or weaker credibility signals.

But there is one layer no lightweight source-quality scan can fully settle: whether the source actually says what the AI claims it says.

An AI can cite a real, existing paper while attributing to it a conclusion it never made. It can link to a legitimate report while overstating what the report found. That layer still requires a human to open the original source and read it.

How to pressure-test a suspicious citation

Before you rely on a cited source, ask the AI for details that are harder to fake consistently:

Ask for specific details. What was the sample size? Which methodology? Which page or section?
Force a contradiction check. Ask for the main finding, then ask for limitations or caveats. Fabricated support often becomes unstable under follow-up.
Demand a short supporting quote. Ask for the exact sentence from the source that supports the claim, then search for that sentence or open the source directly.

These methods do not make AI truthful. They make unsupported claims harder to sustain.

The lesson

KPMG's problem was not that it lacked a policy saying humans should verify AI output. It had one. The problem was that a plausible source trail can pass through a workflow without anyone slowing down at the citation layer.

That is where AI FactScan fits: not as a final judge, but as an early warning system for source quality.