Model capability gets the headlines. The quieter, more consequential trend is price. Per Stanford's 2025 AI Index, the cost of GPT-3.5-level inference fell from $20 to $0.07 per million tokens in 18 months — a 280-fold drop.
What cheap inference unlocks
- New product shapes. Features that were uneconomical at $20/M tokens — summarising every document, classifying every ticket, drafting every reply — become trivial at $0.07.
- Volume over cleverness. You can call a model many times (draft, critique, revise) where you once had to be sparing.
- Smaller models, same job. A model matching 2022's flagship now runs with ~142× fewer parameters, pushing capability to the edge and on-device.
When the cost of intelligence drops two orders of magnitude, the constraint stops being "can we afford to call the model?" and becomes "have we designed the product to deserve it?"
The flip side
Cheap inference also erodes moats. If a capability is one cheap API call away, it isn't a differentiator — your data, workflow and execution are. Plan accordingly.
Sources
- Stanford HAI — 2025 AI Index
Written by ivector
Start a project →