CutScore | Retrieval-Augmented Generation (RAG)

What it is

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model so it references an authoritative knowledge base outside of its training data before generating a response.

In practice, a RAG pipeline converts your external data into vector representations, retrieves the most relevant chunks when a user submits a query, injects that retrieved context into the prompt, and then lets the foundation model generate a grounded, citation-capable answer.

Mental model

Think of RAG as giving the model a research assistant. Before the model writes anything, the assistant runs to the library, pulls the relevant pages, and hands them over. The model reads those pages and responds — rather than relying on what it memorized during training.

When to use it

The exam frequently asks you to choose between RAG and fine-tuning. The decision turns on whether the gap is a knowledge gap (what the model knows) or a behavior gap (how the model responds).

Situation	RAG	Fine-tuning
Responses must reflect your proprietary or frequently-changing data	Preferred — connects the model to a live, updatable knowledge source	Not preferred — retraining is expensive and snapshots data at a point in time
You need source attribution and citations in responses	Supported — retrieved documents can be surfaced as references	Not a native output of fine-tuning
You need the model to adopt a new tone, format, or task-specific skill	Not the right tool	Preferred
You want to avoid expensive model retraining	Preferred — no retraining required	Requires retraining

Amazon Bedrock Knowledge Bases provides a managed RAG implementation where Amazon Bedrock handles embedding, storage, retrieval, and generation, and can include citations in the generated response so the original data source can be referenced and accuracy can be checked.

Common misconception

RAG does not retrain or update the foundation model itself.

The model's weights are unchanged. RAG improves responses by supplying relevant external context at inference time — not by teaching the model new facts permanently. Candidates often assume that because the model's answers improve after RAG is added, the underlying model must have been updated. It has not. The knowledge lives in the external data source; the model reads it fresh on every query.

A related trap: RAG is not a substitute for fine-tuning when the goal is to change how the model behaves (its style, task format, or reasoning pattern) rather than what it knows.

How it shows up on the exam

The exam targets your ability to recognize which technique addresses which problem. A question will describe a scenario — such as a model producing outdated answers or fabricating facts — and ask you to identify the appropriate solution. RAG is the appropriate choice when the described problem is a knowledge gap: stale, domain-specific, or proprietary information that was not part of the model's training data.

Signal phrases that point toward RAG as the answer:

"up-to-date" or "current information"
"proprietary data" or "internal documents"
"source attribution" or "citations"
"without retraining" or "cost-effective"
responses that are "generic" or "hallucinated" due to missing domain knowledge

Candidates often confuse RAG with fine-tuning because both can improve response quality. The grounding distinction is that RAG addresses knowledge gaps through retrieval at inference time, while fine-tuning addresses behavior gaps through training. When a scenario emphasizes fresh or proprietary data without retraining, RAG is the framing the question is testing.

Related concepts

Vector Databases — the storage layer that makes semantic retrieval possible; RAG depends on embedding your data and querying it by meaning, not keyword.
Bedrock Knowledge Bases — AWS's managed service that implements the full RAG pipeline, handling ingestion, embedding, retrieval, and generation.
AI Hallucination — the failure mode RAG is most directly designed to reduce; grounding responses in retrieved documents gives the model authoritative context to draw from.

Retrieval-Augmented Generation (RAG) — AIF-C01