Chain-of-thought: the paper that taught models to “show their work”

One of the most influential and least technical findings in modern AI is this: if you ask a model to think step by step, it gets noticeably smarter. That's the core of a 2022 Google paper, *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*.

(This is our explanation of the paper; the source is linked.)

What they tried

The researchers gave models multi-step problems — word maths, logic puzzles — two ways. First the normal way: question in, answer out. Then a second way: they prompted the model to write out its intermediate reasoning before giving the final answer, like showing your work on a maths test.

What they found

The difference was large. On hard reasoning tasks, walking through the steps improved accuracy dramatically — and the bigger the model, the bigger the gain. Crucially, this needed no retraining. The capability was already inside the model; the right prompt unlocked it.

Why it matters

It's free capability. A better prompt, not a bigger model, often gets a much better answer.
It made AI auditable. When a model shows its steps, a human can check the reasoning — vital in regulated or high-stakes work.
It seeded "reasoning" models. Today's models that "think" before answering are a direct descendant of this idea, now baked in rather than prompted.

The headline is almost philosophical: the model could already reason — it just needed to be asked to do it out loud.

The caveat

A visible chain of reasoning looks convincing, but a tidy explanation isn't proof the answer is right — models can reason their way to confident, wrong conclusions. That gap between looking right and being right is exactly why AI agents still fail in production without proper checks.

Sources

Wei et al. (2022) — *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*

Written by ivector

Start a project →

Chain-of-thought: the paper that taught models to “show their work”

What they tried

What they found

Why it matters

The caveat

Sources

Keep reading

Chinchilla and the scaling laws: why bigger models aren’t always better

“Attention Is All You Need”, explained for non-engineers

The paper that introduced RAG, explained simply