If you've heard that an AI tool can "answer questions about your documents," you've heard about RAG — retrieval-augmented generation. The term comes from a 2020 paper by Facebook AI researchers, *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*. Here's the idea without the jargon.
(Our plain-language summary; the paper is linked so you can read the original.)
The problem
A language model only knows what it absorbed during training. Ask it about your contracts, your product docs or yesterday's data and it either doesn't know — or worse, confidently makes something up.
The idea
RAG splits the job in two, like an open-book exam:
- 1.Retrieve. When you ask a question, the system first searches a library of your documents and pulls back the most relevant passages.
- 2.Generate. It hands those passages to the language model and asks it to answer using that text — not just its memory.
The model becomes a skilled writer working from your sources, instead of a know-it-all working from memory.
Why it became the default
- No retraining. You can point RAG at your latest documents without the expensive, slow process of fine-tuning a model.
- Fewer made-up answers. Grounding responses in retrieved text reduces hallucination — and lets you show citations.
- Easy to update. Change a document and the next answer reflects it instantly.
RAG is why "chat with your data" went from research demo to standard product feature in just a few years.
The catch
RAG is only as good as its retrieval. If the search step pulls the wrong passages, the model answers confidently from the wrong source. Most "RAG isn't working" problems are really search problems. When RAG isn't the right fit, the alternative is fine-tuning — we compare the two in RAG vs fine-tuning.
Sources
- Lewis et al. (2020) — *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*