One of the most influential and least technical findings in modern AI is this: if you ask a model to think step by step, it gets noticeably smarter. That's the core of a 2022 Google paper, *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*.
(This is our explanation of the paper; the source is linked.)
What they tried
The researchers gave models multi-step problems — word maths, logic puzzles — two ways. First the normal way: question in, answer out. Then a second way: they prompted the model to write out its intermediate reasoning before giving the final answer, like showing your work on a maths test.
What they found
The difference was large. On hard reasoning tasks, walking through the steps improved accuracy dramatically — and the bigger the model, the bigger the gain. Crucially, this needed no retraining. The capability was already inside the model; the right prompt unlocked it.
Why it matters
- It's free capability. A better prompt, not a bigger model, often gets a much better answer.
- It made AI auditable. When a model shows its steps, a human can check the reasoning — vital in regulated or high-stakes work.
- It seeded "reasoning" models. Today's models that "think" before answering are a direct descendant of this idea, now baked in rather than prompted.
The headline is almost philosophical: the model could already reason — it just needed to be asked to do it out loud.
The caveat
A visible chain of reasoning looks convincing, but a tidy explanation isn't proof the answer is right — models can reason their way to confident, wrong conclusions. That gap between looking right and being right is exactly why AI agents still fail in production without proper checks.
Sources
- Wei et al. (2022) — *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*