Orientation
What if the model doesn't know the answer?
LLMs only know what was in their training data - which has a cutoff date and definitely doesn't include your company's internal docs, your personal notes, or yesterday's news. So how do products like "chat with your PDF" actually work?
The answer is simpler than you'd think.
The Problem
An LLM's training data is frozen in time. Ask it about your company's refund policy, a document you uploaded last week, or today's stock price - it has no idea. It will either say "I don't know" or, worse, confidently make something up (this is called hallucination).
Fine-tuning (retraining the model on your specific data) is expensive, slow, and still doesn't help with data that changes daily. There had to be a simpler way.
The Solution: RAG
RAG stands for Retrieval-Augmented Generation. Despite the fancy name, here's all it does:
- User asks a question
- Search your documents for chunks relevant to that question
- Paste those chunks into the prompt alongside the question
- Let the model answer using the provided context
That's it. RAG is just "search + paste + ask."
The model doesn't browse the internet. It doesn't access a database live. The application finds relevant text, injects it into the context window, and the model reads it like any other text.
"How do I get a refund?"
Finds the chunks most relevant to the question
Right alongside the system prompt and the question
It treats the retrieved docs like any other text in the window
Here's what the context window actually looks like after retrieval - notice how the retrieved documents are just more text alongside everything else:
The Open-Book Exam Analogy
Think of RAG like an open-book exam.
The student (the LLM) doesn't memorize every textbook. Instead, before each question, you hand them the most relevant pages. They read the provided material and synthesize an answer.
The student didn't "know" the answer. They read it from the material you gave them. If you give them the wrong pages, they'll give a wrong answer. If you give them pages with misleading information, they'll be misled.
But How Do You Search?
OK, so RAG needs to find the right documents. But how? You've got thousands of pages. A user asks "How do I get my money back?" - how do you find the relevant chunk?
String matching doesn't work. The refund policy might say "reimbursement procedure" or "return process" - none of those words match "get my money back." If you're just Ctrl+F-ing for keywords, you'll miss it.
This is where embeddings come in.
An embedding is a way to convert text into a list of numbers that captures its meaning, not just its words. Texts about similar things get similar numbers, even if they use completely different words.
So the system does this:
- All your documents get split into small chunks (paragraphs, sections, etc.)
- Each chunk gets converted into an embedding - a numerical fingerprint of its meaning
- When the user asks a question, that question gets converted into an embedding too
- The system finds the chunks whose embeddings are closest to the question's embedding
"How do I get my money back?" and a chunk titled "Reimbursement Procedures" have similar embeddings - because they're about the same thing. The search finds it, even though they share zero words.
A company uses RAG to let employees search internal docs. An attacker hides instructions inside a document that gets retrieved. What happens?
What RAG Doesn't Do
Common misconceptions:
- RAG doesn't give the model internet access. Documents are pre-processed and stored. The model reads what's retrieved, nothing more.
- RAG doesn't make the model smarter. It gives the model more relevant context, but the model can still misinterpret or hallucinate.
- RAG doesn't guarantee accuracy. If the retriever finds the wrong chunks, the model answers with wrong information - confidently.
Next Steps
RAG lets models read external data. But what if you want the model to take actions - send an email, look up a flight, or query a database in real time?
That's where tools and function calling come in - and where text manipulation starts having real-world consequences.
Next in path
Tools & Function Calling →When LLMs take actions in the real world