Wangari Digest
Wangari Digest
Why single-model workflows are a liability
0:00
-9:20

Why single-model workflows are a liability

And how to use AI instead, constructively, by letting it work against itself

When ChatGPT or Claude confidently tells you something wrong, it hasn’t made a mistake. The model is doing exactly what its neurons are optimised to do: produce an answer you will accept.

In the financial services sector, we are deploying these models to summarise regulatory documents, extract ESG data, and draft client reports. But when being wrong has actual consequences—when a fabricated citation or a hallucinated regulatory requirement makes it into a final document—the cost is immense. We are paying a hidden “hallucination tax” in the form of manual verification, eroded trust, and potential liability.

The Over-Compliance Problem

Recent research from Tsinghua University found that fewer than 0.01% of neurons in a language model are responsible for hallucination. What these neurons encode isn’t wrong information. It’s the drive to give you an answer, any answer, rather than say “I don’t know.”

The researchers tested these neurons against four failure types: hallucination, sycophancy (agreeing with you when you’re wrong), false premise acceptance, and jailbreak vulnerability. The same neurons drove all four. Hallucination and sycophancy are the same behaviour at the neuron level. It is simply over-compliance. And safety training doesn’t fix it. The models are fundamentally built to please us, even if it means making things up.

The Post-Hoc Verification Shift

Instead of trying to build a single, perfect model that never hallucinates, the new paradigm is to assume the model will hallucinate, and build systems to catch it after the fact. We are moving from “trust the AI” to “put the AI on trial.”

I was recently inspired by one of our readers, Maarten Rischen, who built a product called Triall AI. Maarten kept catching models fabricating sources, so he automated a cross-referencing process. Triall puts models on trial through a nine-stage process. Three different AI models (like Claude Opus, Grok, and GPT) answer your question independently. Because they have different architectures, they have different failure patterns. When they disagree, that is valuable information.

The models then blind peer-review each other. The best answer gets attacked by an adversarial critic. Finally, specific claims are checked against live web sources. Not AI checking AI, but real sources checking AI.

The Bottom Line

We need to stop treating AI like a junior associate whose work we must painstakingly review, and start treating it like a system of components that can check each other. If the stakes are high, one model is a liability. Three models, arguing it out, is a strategy.

Discussion about this episode

User's avatar

Ready for more?