Chinchilla and the scaling laws: why bigger models aren’t always better

For a few years the AI race looked like a simple contest: whoever trains the biggest model wins. In 2022, a DeepMind paper called *Training Compute-Optimal Large Language Models* — better known as the Chinchilla paper — showed that was the wrong race. Here it is in plain terms.

(This is our explanation of the paper, with the original linked so you can check the source.)

The question

Given a fixed budget of computing power, what's the best way to spend it: build a bigger model, or train a smaller one on more data? Until then, most labs had quietly assumed "bigger model" was the answer.

What they found

The researchers trained many models of different sizes on different amounts of data and found a clear pattern: most state-of-the-art models were too big and undertrained. For a given compute budget, you get a better model by making it smaller and feeding it far more data — roughly scaling model size and training data together.

To prove it, they trained "Chinchilla" — a model 4× smaller than the era's giant (Gopher) but trained on much more data. The smaller model won, beating the larger one across benchmarks.

Why it mattered

Efficiency over size. It reframed the goal from "biggest" to "best-balanced." Nearly every capable model since has followed Chinchilla-style ratios.
It made smaller models viable. A well-trained smaller model can beat a poorly-balanced large one — part of why small, specialised models are now so competitive.
It changed the economics. A smaller compute-optimal model is cheaper to run, not just to train — which compounds over a product's lifetime.

The lesson wasn't "models don't need to be big." It was "size without matching data is wasted money." Balance beats brute force.

The business takeaway

When a vendor pitches you "the biggest model," the right question is "the most appropriate model." For most real workloads, a smaller, well-matched model is faster, cheaper and good enough — which is exactly the build-vs-buy calculation worth running before you commit.

Sources

Hoffmann et al. (2022) — *Training Compute-Optimal Large Language Models*
Stanford HAI — AI Index 2025 (efficiency trends)

Written by ivector

Start a project →

Chinchilla and the scaling laws: why bigger models aren’t always better

The question

What they found

Why it mattered

The business takeaway

Sources

Keep reading

Chain-of-thought: the paper that taught models to “show their work”

“Attention Is All You Need”, explained for non-engineers

The paper that introduced RAG, explained simply