Introduction · Read in full

The Machine That Looks Done.

~7 minute read · From the book

Bad AI output is easy to dismiss when it is ridiculous. The real danger is the answer that looks finished, sounds competent, survives a quick review, and fails only after money moves, customers click, or a production system depends on it. This book is about that kind of failure, because it is the one modern software teams are institutionalizing by mistake.

The Shipment

On a Thursday afternoon in March 2024, a senior engineer at a mid-sized fintech company asked Claude, Anthropic's large language model, to refactor a payment reconciliation module. The original code was ugly but correct — three hundred lines of nested conditionals that matched incoming bank transactions against internal ledger entries, handling edge cases accumulated over four years of production use. The engineer pasted the code into the conversation, described what it did, and asked for a cleaner version.

Claude returned a beautiful refactoring. The code was well-structured, properly abstracted, and half the length of the original. It had type annotations, clear variable names, and a helper function for each reconciliation rule. The engineer reviewed it, confirmed it compiled, ran the existing unit tests — which passed — and merged it into the main branch.

On Friday morning the code reached production. By Monday, the company's finance team had discovered $47,000 in unmatched transactions. The refactored code handled the common case perfectly. It handled most of the edge cases. But it had quietly dropped three reconciliation rules that the original code applied to international wire transfers involving currency conversion — rules that existed because a specific bank's API returned amounts in a format that violated its own documentation. The original code had handled this with an ugly, deeply-nested conditional that looked like a mistake but was in fact the result of a week-long debugging session two years prior. Claude's refactoring had recognized the pattern as non-standard, assumed it was legacy cruft, and replaced it with a clean implementation that followed the documented API contract.

The documented contract was wrong.

The output looked complete. It compiled. The tests passed. The code was, by every surface metric, better than what it replaced. And it was broken in a way that would not be discovered until real money moved through the system.

This is the completion illusion.

The Completion Illusion Defined

The core claim of this book is simple: large language models are unusually good at producing the signals people mistake for finished work. They give you structure, fluency, confidence, and convention on demand. In software, those signals are often enough to get code approved, tests trusted, and plans greenlit. That is the illusion. Looking finished is not the same as being done.

A human engineer who produces a clean refactor usually had to understand the system well enough to preserve the ugly parts that matter. A model can reproduce the shape of that competence without carrying the understanding underneath it. That is why the failure is so expensive. The output is not nonsense. It is plausible work with one missing edge case, one wrong assumption, one omitted constraint, or one fake proof of safety.

This is bigger than hallucination. Hallucination is the loud version of the problem: fake cases, fake APIs, fake facts. The completion illusion is the quiet version. It is the test suite that covers everything except what matters. The documentation that is correct everywhere except the dangerous corner. The refactor that makes the code prettier by deleting the ugly branch that kept production alive.

Once you see the pattern, a lot of modern AI behavior stops looking mysterious. The tools are not failing randomly. They are optimized to satisfy the same surface cues humans use to say, "Looks good to me." In a profession built on review, proxy metrics, and trust, that is enough to do damage.

Why This Matters Now

So why write this now? Because the industry is moving from experimentation to dependence before it has rebuilt its quality habits. The adoption story is already familiar: companies buy copilots, managers report velocity gains, teams quietly rewrite their workflows around generated output, and review standards slip a little because the machine is fast, polished, and right often enough to be tempting. That is exactly how a bad norm becomes an institutional one.

The timing matters for another reason. AI output no longer stops at the first answer. It is being chained into tests, tickets, docs, agent handoffs, and fine-tuning pipelines. A weak answer can now masquerade as ground truth for the next system in line. By the time a human sees the result, the original error may have been laundered through layers of professional-looking work.

Meanwhile, the incentives push in the wrong direction. Model providers get rewarded for speed, scale, and benchmark wins. Buyers get rewarded for visible output, not invisible correction cost. Engineers get rewarded for shipping. Nobody in that chain is naturally rewarded for saying: stop, this looks done but is not done.

That is the reason for this book. Not to relitigate whether AI is useful. It is useful. The problem is that usefulness is being mistaken for reliability at exactly the moment teams are building process around that mistake.

The Stakes

The completion illusion is not an abstract research problem. It is a concrete, present, and escalating risk to software quality, system reliability, and human safety. As of early 2026, AI-assisted development tools are embedded in the daily workflow of millions of software engineers. GitHub Copilot alone has over 1.8 million paying subscribers across more than 50,000 organizations. Claude, ChatGPT, and their competitors serve millions more. The code these tools generate is shipping to production systems that manage financial transactions, medical records, transportation infrastructure, and critical communications. Every one of those systems is subject to the completion illusion.

The financial cost is already measurable, if poorly measured. Developers report spending 30-50% of their time debugging or correcting AI-generated code — time that does not appear in the productivity statistics that justified adopting the tools in the first place. We are generating technical debt faster than at any point in the history of software engineering, and we are doing so while congratulating ourselves on our productivity.

The safety implications are more urgent still. AI-assisted development has already penetrated domains where software failures have physical consequences: medical device firmware, autonomous vehicle control systems, industrial automation, and aviation software. In these domains, the completion illusion is not an inconvenience or a cost center. It is a threat to human life.

The historical precedent is not encouraging. The Therac-25 radiation therapy machine killed six patients between 1985 and 1987 due to software defects that went undetected precisely because the system appeared to be working correctly. The Ariane 5 rocket self-destructed 37 seconds after launch in 1996 due to a software conversion error inherited from the Ariane 4, which had been assumed to be correct without verification. In both cases, the failure mechanism was the same: output that looked right, passed surface-level checks, and was catastrophically wrong. The completion illusion is this mechanism, automated and operating at scale.

The institutional consequences may be the most insidious. When organizations adopt AI tools and restructure their processes around the assumption that AI output is reliable, they dismantle the human verification infrastructure that previously caught defects. Senior engineers who once reviewed code in detail are reassigned to higher-level tasks. Code review becomes a formality. Institutional knowledge about edge cases, system quirks, and non-obvious requirements — knowledge that existed in the heads of experienced engineers who performed careful reviews — is lost. The organization becomes simultaneously more productive and more fragile: capable of producing more code, less capable of ensuring that any of it is correct.

This is not a hypothetical future. It is the present state of AI adoption in software engineering. The question is not whether AI should be used in software engineering. That question has been answered by market adoption. The question is whether the engineering profession will develop the verification infrastructure, the institutional practices, and the professional standards necessary to use AI safely — or whether we will continue to build on the completion illusion until the failures become impossible to ignore.

This book is an attempt to make the failures impossible to ignore now, before the cost of ignoring them becomes catastrophic.

Continue reading

Twenty chapters wait on the other side.

Buy on Amazon See all chapters

Tweaks