RAG: When LLMs Read External Data - Free AI Security Module

Orientation

What if the model doesn't know the answer?

LLMs only know what was in their training data - which has a cutoff date and definitely doesn't include your company's internal docs, your personal notes, or yesterday's news. So how do products like "chat with your PDF" actually work?

The answer is simpler than you'd think.

The Problem

An LLM's training data is frozen in time. Ask it about your company's refund policy, a document you uploaded last week, or today's stock price - it has no idea. It will either say "I don't know" or, worse, confidently make something up (this is called hallucination).

Fine-tuning (retraining the model on your specific data) is expensive, slow, and still doesn't help with data that changes daily. There had to be a simpler way.

The Solution: RAG

RAG stands for Retrieval-Augmented Generation. Despite the fancy name, here's all it does:

User asks a question
Search your documents for chunks relevant to that question
Paste those chunks into the prompt alongside the question
Let the model answer using the provided context

That's it. RAG is just "search + paste + ask."

The model doesn't browse the internet. It doesn't access a database live. The application finds relevant text, injects it into the context window, and the model reads it like any other text.

How It Works

1User asks a question

"How do I get a refund?"

2App searches your documents

Finds the chunks most relevant to the question

3App pastes the chunks into the context window

Right alongside the system prompt and the question

4LLM reads everything and answers

It treats the retrieved docs like any other text in the window

Here's what the context window actually looks like after retrieval - notice how the retrieved documents are just more text alongside everything else:

What the context window looks like with RAG

[System Prompt]
[Retrieved Document Chunk 1]
[Retrieved Document Chunk 2]
[User's Question]

The retrieved documents sit right next to the system prompt and the user's message - all just text in the same window.

The Open-Book Exam Analogy

Think of RAG like an open-book exam.

The student (the LLM) doesn't memorize every textbook. Instead, before each question, you hand them the most relevant pages. They read the provided material and synthesize an answer.

The student didn't "know" the answer. They read it from the material you gave them. If you give them the wrong pages, they'll give a wrong answer. If you give them pages with misleading information, they'll be misled.

But How Do You Search?

OK, so RAG needs to find the right documents. But how? You've got thousands of pages. A user asks "How do I get my money back?" - how do you find the relevant chunk?

String matching doesn't work. The refund policy might say "reimbursement procedure" or "return process" - none of those words match "get my money back." If you're just Ctrl+F-ing for keywords, you'll miss it.

This is where embeddings come in.

An embedding is a way to convert text into a list of numbers that captures its meaning, not just its words. Texts about similar things get similar numbers, even if they use completely different words.

So the system does this:

All your documents get split into small chunks (paragraphs, sections, etc.)
Each chunk gets converted into an embedding - a numerical fingerprint of its meaning
When the user asks a question, that question gets converted into an embedding too
The system finds the chunks whose embeddings are closest to the question's embedding

"How do I get my money back?" and a chunk titled "Reimbursement Procedures" have similar embeddings - because they're about the same thing. The search finds it, even though they share zero words.

Predict

A company uses RAG to let employees search internal docs. An attacker hides instructions inside a document that gets retrieved. What happens?

What RAG Doesn't Do

Common misconceptions:

RAG doesn't give the model internet access. Documents are pre-processed and stored. The model reads what's retrieved, nothing more.
RAG doesn't make the model smarter. It gives the model more relevant context, but the model can still misinterpret or hallucinate.
RAG doesn't guarantee accuracy. If the retriever finds the wrong chunks, the model answers with wrong information - confidently.

RAG lets models read external data. But what if you want the model to take actions - send an email, look up a flight, or query a database in real time? That's where tools and function calling come in - and where text manipulation starts having real-world consequences.