Default instinct: reach for the largest, smartest model. Often it's the wrong call. Stanford's AI Index notes that a model matching 2022's flagship performance now runs with roughly 142× fewer parameters — small models have caught up fast.
Why small often wins
- Cost & speed. Smaller models are cheaper per token and lower latency — and at production volume, that compounds.
- Privacy & control. Small models can run on your own infrastructure or on-device, keeping data in your boundary.
- Right-sized quality. For narrow, well-defined tasks — classification, extraction, routing — a tuned small model frequently matches a frontier model.
The frontier model is a Swiss Army knife. Most production tasks need a scalpel.
A practical pattern
Route by difficulty: a cheap small model handles the easy 80% of requests; escalate only the hard 20% to a large model. You cut cost and energy dramatically while keeping quality where it matters — the single biggest lever most teams haven't pulled.
Sources
- Stanford HAI — 2025 AI Index
Written by ivector
Start a project →