Large language models — AIF-C01
Reference page teaching large language models (LLMs) for the AWS AIF-C01 exam — what they are, how next-token prediction and context windows work, and the key…
WHAT IT IS
A large language model (LLM) is a very large deep learning model that is pre-trained on vast amounts of data. Its underlying architecture is the transformer, which uses an encoder and a decoder with self-attention capabilities and processes entire input sequences in parallel rather than one word at a time. During pre-training, the model iteratively adjusts its parameters until it can correctly predict the next token from the previous tokens in a sequence — a process called self-supervised learning. The result is a model containing hundreds of billions of parameters that can perform a wide range of language tasks without being retrained from scratch for each one.
Mental model
Think of an LLM as a high-resolution map of how language fits together, built by reading a vast corpus and learning which tokens tend to follow which others. The context window is the section of that map the model can consult at any one moment. Every response is constructed one token at a time, with the model choosing the most contextually plausible next token given everything already in the window.
Three concepts interlock:
| Concept | What it is | Why it matters |
|---|---|---|
| Token | The unit the model reads and predicts — a word, subword, or character chunk | Determines how much text fits in one prompt |
| Context window | The total number of tokens the model can consider at once | Bounds how much prior conversation, document, or instruction the model "remembers" |
| Next-token prediction | The training objective: given prior tokens, predict the next one | The mechanism behind all LLM outputs, from answers to code |
Word embeddings support this: the model represents tokens as multi-dimensional vectors, and words with similar contextual meanings are positioned close to each other in that vector space, letting the model reason about meaning rather than just matching exact strings.
When to use it
The blueprint lists several model types under Task 2.1 — knowing when an LLM is the right choice (versus a different foundation model type) is a testable decision.
| Model type | Primary modality | Typical use cases | When NOT the right pick |
|---|---|---|---|
| LLM (transformer-based) | Text in / text out | Summarization, Q&A, translation, code generation, chatbots, customer service agents | When the task is image generation, audio synthesis, or video generation |
| Multi-modal model | Text + images (or other combinations) | Image captioning, visual Q&A, document understanding | When the task is purely text and a lighter model suffices |
| Diffusion model | Noise → image/audio | Image generation, audio generation | When the task requires text reasoning or conversation |
LLMs are suited to any task where the input and output are primarily text and where the model needs to generalize across many topics without task-specific retraining.
COMMON MISCONCEPTION
Misconception: LLMs "understand" or "know" things the way a person does.
What the official AWS documentation actually states is more precise: LLMs are models that predict the next token based on patterns learned during training. They are explicitly described as "not perfect" and "not infallible." The model adjusts parameter values to correctly predict tokens — it does not build a factual knowledge base that can be queried reliably. This is the source of hallucination: the model generates a plausible-sounding next token even when there is no grounded fact behind it.
A related trap: candidates sometimes assume that because an LLM was pre-trained on a large corpus, it has current or complete knowledge. Pre-training is a one-time process on a fixed dataset; the model does not update itself from new information after training unless fine-tuned or augmented (for example, via retrieval).
How it shows up on the exam
The blueprint places LLMs explicitly in Task 2.1 under "transformer-based LLMs" as one of the foundational generative AI concepts candidates must understand. The cognitive target is recognition and explanation — you are not expected to build or tune LLMs, but you are expected to distinguish them from other model types and explain their key properties.
A common misconception the exam exploits is treating LLMs and foundation models as synonyms. Foundation models are the broader category — an LLM is a specific type of foundation model focused on language. Candidates who conflate the two may misidentify which model type is appropriate for a given modality.
Signal phrases to watch for in questions: "pre-trained on vast amounts of data," "next-token prediction," "context window," "transformer," "parameters," "zero-shot," "few-shot," and "fine-tuning." When a question uses these phrases, it is testing LLM fundamentals.
Questions about capabilities (summarization, translation, code generation, chatbots) align with what the official AWS documentation lists as LLM use cases. Questions about limitations (hallucination, nondeterminism, inaccuracy) align with what the blueprint lists under Task 2.2 but are rooted in the same architectural facts — LLMs predict tokens, they do not retrieve verified facts.
Related concepts
- Foundation models — the broader family that LLMs belong to; understanding the relationship prevents conflating the two on the exam.
- Generative AI — the paradigm within which LLMs operate; LLMs are the most prominent generative AI model type for text.
- Embeddings — the vector representations that LLMs use internally to capture contextual meaning; a testable concept that appears alongside LLMs in Task 2.1.
Sources
Every claim on this page traces to the public exam blueprint and official documentation: