Skip to content
← Back to blog
AI Strategy·June 4, 2026·4 min read

Inference got 280× cheaper in 18 months. Here’s what it unlocks

The collapsing cost of running models is the most underrated story in AI. It changes which products are viable — and which moats disappear.

Model capability gets the headlines. The quieter, more consequential trend is price. Per Stanford's 2025 AI Index, the cost of GPT-3.5-level inference fell from $20 to $0.07 per million tokens in 18 months — a 280-fold drop.

What cheap inference unlocks

  • New product shapes. Features that were uneconomical at $20/M tokens — summarising every document, classifying every ticket, drafting every reply — become trivial at $0.07.
  • Volume over cleverness. You can call a model many times (draft, critique, revise) where you once had to be sparing.
  • Smaller models, same job. A model matching 2022's flagship now runs with ~142× fewer parameters, pushing capability to the edge and on-device.
When the cost of intelligence drops two orders of magnitude, the constraint stops being "can we afford to call the model?" and becomes "have we designed the product to deserve it?"

The flip side

Cheap inference also erodes moats. If a capability is one cheap API call away, it isn't a differentiator — your data, workflow and execution are. Plan accordingly.

Sources

Written by ivector
Start a project →