Prompt technique
Retrieval-Augmented Generation (RAG)
Ground the model in your own data — docs, tickets, code, anything searchable.
What it is
Retrieval-Augmented Generation pairs an LLM with a retriever (vector database, BM25, web search) so the model answers using up-to-date, source-cited context rather than its frozen weights. RAG eliminates a huge class of hallucinations, lets you ship answers about private or recent data without fine-tuning, and gives you provenance for every claim.
When to use it
- ✓Q&A over private documents, codebases, support tickets
- ✓Facts that change after the model's training cutoff
- ✓Any answer that should cite its sources
Example
You are a support agent. Answer the user's question using only the context below. If the answer is not in the context, say "I don't know based on the available docs."
Context:
{retrieved_chunks}
Question: {user_question}
Answer with citations like [1], [2] mapped to the chunk index.Why it works: Strict grounding ("using only the context"), an explicit fallback, and citation formatting — the three rules that make RAG reliable.
Pitfalls
- !Garbage retrieval = garbage answers. Tune chunking, embeddings and reranking before tweaking the prompt.
- !Long contexts dilute attention — rerank to top-k 4–8 chunks rather than dumping everything.