← Concepts
Applications of Foundation ModelsAIF-C01 · Task 3.1

Retrieval-Augmented Generation (RAG) — AIF-C01

What RAG is, how the retrieval-to-generation pipeline works, when to choose it over fine-tuning, and the misconceptions the exam exploits.

What it is

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model so it references an authoritative knowledge base outside of its training data before generating a response.

In practice, a RAG pipeline converts your external data into vector representations, retrieves the most relevant chunks when a user submits a query, injects that retrieved context into the prompt, and then lets the foundation model generate a grounded, citation-capable answer.

Mental model

Think of RAG as giving the model a research assistant. Before the model writes anything, the assistant runs to the library, pulls the relevant pages, and hands them over. The model reads those pages and responds — rather than relying on what it memorized during training.

When to use it

The exam frequently asks you to choose between RAG and fine-tuning. The decision turns on whether the gap is a knowledge gap (what the model knows) or a behavior gap (how the model responds).

SituationRAGFine-tuning
Responses must reflect your proprietary or frequently-changing dataPreferred — connects the model to a live, updatable knowledge sourceNot preferred — retraining is expensive and snapshots data at a point in time
You need source attribution and citations in responsesSupported — retrieved documents can be surfaced as referencesNot a native output of fine-tuning
You need the model to adopt a new tone, format, or task-specific skillNot the right toolPreferred
You want to avoid expensive model retrainingPreferred — no retraining requiredRequires retraining

Amazon Bedrock Knowledge Bases provides a managed RAG implementation where Amazon Bedrock handles embedding, storage, retrieval, and generation, and can include citations in the generated response so the original data source can be referenced and accuracy can be checked.

Common misconception

RAG does not retrain or update the foundation model itself.

The model's weights are unchanged. RAG improves responses by supplying relevant external context at inference time — not by teaching the model new facts permanently. Candidates often assume that because the model's answers improve after RAG is added, the underlying model must have been updated. It has not. The knowledge lives in the external data source; the model reads it fresh on every query.

A related trap: RAG is not a substitute for fine-tuning when the goal is to change how the model behaves (its style, task format, or reasoning pattern) rather than what it knows.

How it shows up on the exam

The exam targets your ability to recognize which technique addresses which problem. A question will describe a scenario — such as a model producing outdated answers or fabricating facts — and ask you to identify the appropriate solution. RAG is the appropriate choice when the described problem is a knowledge gap: stale, domain-specific, or proprietary information that was not part of the model's training data.

Signal phrases that point toward RAG as the answer:

  • "up-to-date" or "current information"
  • "proprietary data" or "internal documents"
  • "source attribution" or "citations"
  • "without retraining" or "cost-effective"
  • responses that are "generic" or "hallucinated" due to missing domain knowledge

Candidates often confuse RAG with fine-tuning because both can improve response quality. The grounding distinction is that RAG addresses knowledge gaps through retrieval at inference time, while fine-tuning addresses behavior gaps through training. When a scenario emphasizes fresh or proprietary data without retraining, RAG is the framing the question is testing.

Related concepts

  • Vector Databases — the storage layer that makes semantic retrieval possible; RAG depends on embedding your data and querying it by meaning, not keyword.
  • Bedrock Knowledge Bases — AWS's managed service that implements the full RAG pipeline, handling ingestion, embedding, retrieval, and generation.
  • AI Hallucination — the failure mode RAG is most directly designed to reduce; grounding responses in retrieved documents gives the model authoritative context to draw from.

Sources

Every claim on this page traces to the public exam blueprint and official documentation:

CutScore is an independent study tool and is not affiliated with, authorized by, endorsed by, or sponsored by Amazon Web Services. “AWS” and “AWS Certified AI Practitioner” are trademarks of Amazon.com, Inc. or its affiliates. All content is independently authored from the public exam blueprint and official documentation — no real exam content is used.

The exam-readiness instrument. Know if you’re ready before you book.

Company
Contact