CutScore | Model customization approaches

WHAT IT IS

Model customization is the process of providing training data to a model to improve its performance for specific use cases. Amazon Bedrock provides three customization methods: supervised fine-tuning, reinforcement fine-tuning, and distillation. Each method adjusts a foundation model's parameters, producing a privately owned custom model that only your AWS account can access.

Mental model

Think of customization as a spectrum of how much you already know the right answer:

You have labeled examples (input → correct output): use supervised fine-tuning.
You can measure quality but can't enumerate correct answers: use reinforcement fine-tuning.
You want a smaller, cheaper model that performs like a larger one: use distillation.

The key question is always: what kind of signal can you provide?

When to use it

Method	Input data required	Model parameters change?	Primary goal
Supervised fine-tuning	Labeled prompt–response pairs	Yes	Improve performance on specific tasks with known correct outputs
Reinforcement fine-tuning	Prompts + reward functions (not labeled pairs)	Yes	Optimize for measurable quality criteria; useful when correct answers are hard to define upfront
Distillation	Prompts (with optional labeled pairs); teacher model generates responses	Yes (student model)	Transfer capability from a larger teacher model to a smaller, faster, cost-efficient student model

Supervised fine-tuning is the right choice when you have a labeled dataset and want the model to learn the association between inputs and specific output types.

Reinforcement fine-tuning fits when output quality can be objectively measured — for example, code correctness, mathematical reasoning, or structured outputs — and especially when collecting high-quality labeled examples is expensive or impractical.

Distillation is the right choice when you want to achieve the accuracy of a larger model at lower inference cost: you select a teacher model and a student model, provide prompts, and Amazon Bedrock generates teacher responses to fine-tune the student.

COMMON MISCONCEPTION

A common misconception is that reinforcement fine-tuning requires labeled input–output pairs just like supervised fine-tuning. It does not. Reinforcement fine-tuning explicitly replaces labeled pairs with reward functions that evaluate response quality. The model learns iteratively from feedback scores, not from pre-labeled examples. Conflating these two methods — and treating labeled data as a universal requirement for all fine-tuning — is a trap that scenario-based exam questions are designed to surface.

A second misconception is that distillation is simply running inference on a large model. Distillation is a training process: Amazon Bedrock uses the teacher model's responses to fine-tune the student model's parameters. The student model is changed; the teacher model is not.

How it shows up on the exam

The cognitive target for this topic is distinguishing the right customization method given a scenario's constraints. Candidates who have only a surface-level understanding often confuse the three methods by focusing on the word "fine-tuning" and missing what kind of data or signal each requires.

Watch for scenario language like:

"…has labeled prompt–response pairs and wants to improve accuracy on a specific task" — points toward supervised fine-tuning, where labeled data trains the model to associate input types with output types.
"…can write a function to score responses but does not have labeled examples" — points toward reinforcement fine-tuning, where reward functions replace labeled pairs.
"…needs a smaller, faster model that performs as well as a larger model on their use case" — points toward distillation, where a teacher model's knowledge is transferred to a student model.
"…model's parameters are adjusted" — all three customization methods adjust model parameters; this phrase alone does not distinguish between them.

The official documentation is explicit that reinforcement fine-tuning improves alignment "through feedback-based learning" and that "instead of providing labeled input-output pairs, you define reward functions." Exam scenarios describing a reward-function or scoring approach signal reinforcement fine-tuning, not supervised fine-tuning.

Related concepts

AI Agents — Agents orchestrate tool use and multi-step reasoning at inference time; customization changes model weights at training time. These are complementary, not interchangeable.
Bedrock Knowledge Bases — Knowledge bases ground a model's responses in external data at inference time via retrieval; they do not adjust model parameters.
RAG Design Considerations — Understanding when retrieval-augmented generation is sufficient versus when parameter-level customization is warranted is a key exam decision boundary.

Model customization approaches — AIF-C01