
This is a re-posting of something I shared on LinkedIn, with minor updates. I’m posting here for my own records and to surface it to readers that don’t use LinkedIn very much.
Am I seeing things?
“Hallucination” is a pejorative term given to #genAI results we feel are wrong. But mathematically, based on the inputs and the algorithms applied, no result can be “wrong.” Results may get low ratings from us humans because we found the results useless or weird or funny or even scary, but that’s not a mental health assessment of the #LLM (which has no mind), it’s just a fit-for-purpose assessment of the algorithmic results.
Indeed, calling bad results “hallucinations” perpetuates the lie that #AI is thinking or performing some other human-like cognition. It’s not. (If anyone’s hallucinating, it’s us.)
So why are bad results increasing in the more advanced “reasoning” platforms? Because the companies jostling for dominance in this space are pushing their AI models beyond the breaking point. (Or more accurately, they are hyping up what the models can actually achieve and we’re listening to them.) The AI companies have reached a mathematical plateau with their LLMs but are desperate for the Next Big Thing™.
To make that next big thing—the so-called reasoning models—they are layering on more algorithms. They’re hooking up the models in longer chains in an attempt to get the LLMs to do more “thinking” that can power other actions. And it works! Sometimes.
When your math ain’t mathin’
But this is a game of probabilities. With large and “clean” data sets and a short list of well-understood algorithmic processes, you can consistently get results that are pretty great for focused use cases. Some LLMs are rated quite highly, with good-result ratings in the upper-90s.
But if you chain these good AI tasks together, you increase the odds of a bad end result. How? Math.
Let’s say there are 10 AI processes in a “reasoning” model, chained together to perform a task. If each process has a 90% “good” result probability by itself, when they are chained together the odds of a good result at the end of the 10-step chain are just 34%. That feels wrong, and it’s a simplistic representation of an AI chain, but it’s mathematically right. (I had to check.)
Bottom line: the longer “reasoning” chains get, the more errors you’ll get because the underlying models in each step are not—and cannot be—100% good. All we can do for now is improve the quality of each link in the chain, thereby improving the odds of good outcomes in longer chains. Dial down the marketing hype and there’s no problem here.
I got a fever! And the only prescription… is more cowbell!

But we’re on the hype curve, baby, and marketing is all we’ve got! This hallucination problem must be fixed because they can’t sell us “agents” until we can trust those agents to be autonomous. So that’s what the AI companies will do next—add more processing, more layers, more algorithms, more checks and adjustments and filters. The reasoning models will improve with time. But they will always have worse hit rates than baseline LLMs.
In the meantime? We’ll get more marketing lies and booster hype like, “We don’t know why this is happening—it has a mind of its own!“
No. It doesn’t.
Discover more from digitalpolity.com
Subscribe to get the latest posts sent to your email.
One thought on “Every LLM result is a “hallucination” and I’m tired of acting like it’s not”
Comments are closed.