The paper that introduced RAG, explained simply

If you've heard that an AI tool can "answer questions about your documents," you've heard about RAG — retrieval-augmented generation. The term comes from a 2020 paper by Facebook AI researchers, *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*. Here's the idea without the jargon.

(Our plain-language summary; the paper is linked so you can read the original.)

The problem

A language model only knows what it absorbed during training. Ask it about your contracts, your product docs or yesterday's data and it either doesn't know — or worse, confidently makes something up.

The idea

RAG splits the job in two, like an open-book exam:

1.Retrieve. When you ask a question, the system first searches a library of your documents and pulls back the most relevant passages.
2.Generate. It hands those passages to the language model and asks it to answer using that text — not just its memory.

The model becomes a skilled writer working from your sources, instead of a know-it-all working from memory.

Why it became the default

No retraining. You can point RAG at your latest documents without the expensive, slow process of fine-tuning a model.
Fewer made-up answers. Grounding responses in retrieved text reduces hallucination — and lets you show citations.
Easy to update. Change a document and the next answer reflects it instantly.

RAG is why "chat with your data" went from research demo to standard product feature in just a few years.

The catch

RAG is only as good as its retrieval. If the search step pulls the wrong passages, the model answers confidently from the wrong source. Most "RAG isn't working" problems are really search problems. When RAG isn't the right fit, the alternative is fine-tuning — we compare the two in RAG vs fine-tuning.

Sources

Lewis et al. (2020) — *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*

Written by ivector

Start a project →

The paper that introduced RAG, explained simply

The problem

The idea

Why it became the default

The catch

Sources

Keep reading

“Attention Is All You Need”, explained for non-engineers

The METR study, explained: why AI made experienced developers slower

Chinchilla and the scaling laws: why bigger models aren’t always better