Inference got 280× cheaper in 18 months. Here’s what it unlocks

Model capability gets the headlines. The quieter, more consequential trend is price. Per Stanford's 2025 AI Index, the cost of GPT-3.5-level inference fell from $20 to $0.07 per million tokens in 18 months — a 280-fold drop.

What cheap inference unlocks

New product shapes. Features that were uneconomical at $20/M tokens — summarising every document, classifying every ticket, drafting every reply — become trivial at $0.07.
Volume over cleverness. You can call a model many times (draft, critique, revise) where you once had to be sparing.
Smaller models, same job. A model matching 2022's flagship now runs with ~142× fewer parameters, pushing capability to the edge and on-device.

When the cost of intelligence drops two orders of magnitude, the constraint stops being "can we afford to call the model?" and becomes "have we designed the product to deserve it?"

The flip side

Cheap inference also erodes moats. If a capability is one cheap API call away, it isn't a differentiator — your data, workflow and execution are. Plan accordingly.

Sources

Stanford HAI — 2025 AI Index

Written by ivector

Start a project →

Inference got 280× cheaper in 18 months. Here’s what it unlocks

What cheap inference unlocks

The flip side

Sources

Keep reading

Chinchilla and the scaling laws: why bigger models aren’t always better

Chain-of-thought: the paper that taught models to “show their work”

The agentic AI reality check: Gartner’s numbers cut both ways