Model customization approaches — AIF-C01
Master Amazon Bedrock model customization: supervised fine-tuning, reinforcement fine-tuning, and distillation — and when to use each for AIF-C01.
WHAT IT IS
Model customization is the process of providing training data to a model to improve its performance for specific use cases. Amazon Bedrock provides three customization methods: supervised fine-tuning, reinforcement fine-tuning, and distillation. Each method adjusts a foundation model's parameters, producing a privately owned custom model that only your AWS account can access.
Mental model
Think of customization as a spectrum of how much you already know the right answer:
- You have labeled examples (input → correct output): use supervised fine-tuning.
- You can measure quality but can't enumerate correct answers: use reinforcement fine-tuning.
- You want a smaller, cheaper model that performs like a larger one: use distillation.
The key question is always: what kind of signal can you provide?
When to use it
| Method | Input data required | Model parameters change? | Primary goal |
|---|---|---|---|
| Supervised fine-tuning | Labeled prompt–response pairs | Yes | Improve performance on specific tasks with known correct outputs |
| Reinforcement fine-tuning | Prompts + reward functions (not labeled pairs) | Yes | Optimize for measurable quality criteria; useful when correct answers are hard to define upfront |
| Distillation | Prompts (with optional labeled pairs); teacher model generates responses | Yes (student model) | Transfer capability from a larger teacher model to a smaller, faster, cost-efficient student model |
Supervised fine-tuning is the right choice when you have a labeled dataset and want the model to learn the association between inputs and specific output types.
Reinforcement fine-tuning fits when output quality can be objectively measured — for example, code correctness, mathematical reasoning, or structured outputs — and especially when collecting high-quality labeled examples is expensive or impractical.
Distillation is the right choice when you want to achieve the accuracy of a larger model at lower inference cost: you select a teacher model and a student model, provide prompts, and Amazon Bedrock generates teacher responses to fine-tune the student.
COMMON MISCONCEPTION
A common misconception is that reinforcement fine-tuning requires labeled input–output pairs just like supervised fine-tuning. It does not. Reinforcement fine-tuning explicitly replaces labeled pairs with reward functions that evaluate response quality. The model learns iteratively from feedback scores, not from pre-labeled examples. Conflating these two methods — and treating labeled data as a universal requirement for all fine-tuning — is a trap that scenario-based exam questions are designed to surface.
A second misconception is that distillation is simply running inference on a large model. Distillation is a training process: Amazon Bedrock uses the teacher model's responses to fine-tune the student model's parameters. The student model is changed; the teacher model is not.
How it shows up on the exam
The cognitive target for this topic is distinguishing the right customization method given a scenario's constraints. Candidates who have only a surface-level understanding often confuse the three methods by focusing on the word "fine-tuning" and missing what kind of data or signal each requires.
Watch for scenario language like:
- "…has labeled prompt–response pairs and wants to improve accuracy on a specific task" — points toward supervised fine-tuning, where labeled data trains the model to associate input types with output types.
- "…can write a function to score responses but does not have labeled examples" — points toward reinforcement fine-tuning, where reward functions replace labeled pairs.
- "…needs a smaller, faster model that performs as well as a larger model on their use case" — points toward distillation, where a teacher model's knowledge is transferred to a student model.
- "…model's parameters are adjusted" — all three customization methods adjust model parameters; this phrase alone does not distinguish between them.
The official documentation is explicit that reinforcement fine-tuning improves alignment "through feedback-based learning" and that "instead of providing labeled input-output pairs, you define reward functions." Exam scenarios describing a reward-function or scoring approach signal reinforcement fine-tuning, not supervised fine-tuning.
Related concepts
- AI Agents — Agents orchestrate tool use and multi-step reasoning at inference time; customization changes model weights at training time. These are complementary, not interchangeable.
- Bedrock Knowledge Bases — Knowledge bases ground a model's responses in external data at inference time via retrieval; they do not adjust model parameters.
- RAG Design Considerations — Understanding when retrieval-augmented generation is sufficient versus when parameter-level customization is warranted is a key exam decision boundary.
Sources
Every claim on this page traces to the public exam blueprint and official documentation: