CutScore | Large language models

WHAT IT IS

A large language model (LLM) is a very large deep learning model that is pre-trained on vast amounts of data. Its underlying architecture is the transformer, which uses an encoder and a decoder with self-attention capabilities and processes entire input sequences in parallel rather than one word at a time. During pre-training, the model iteratively adjusts its parameters until it can correctly predict the next token from the previous tokens in a sequence — a process called self-supervised learning. The result is a model containing hundreds of billions of parameters that can perform a wide range of language tasks without being retrained from scratch for each one.

Mental model

Think of an LLM as a high-resolution map of how language fits together, built by reading a vast corpus and learning which tokens tend to follow which others. The context window is the section of that map the model can consult at any one moment. Every response is constructed one token at a time, with the model choosing the most contextually plausible next token given everything already in the window.

Three concepts interlock:

Concept	What it is	Why it matters
Token	The unit the model reads and predicts — a word, subword, or character chunk	Determines how much text fits in one prompt
Context window	The total number of tokens the model can consider at once	Bounds how much prior conversation, document, or instruction the model "remembers"
Next-token prediction	The training objective: given prior tokens, predict the next one	The mechanism behind all LLM outputs, from answers to code

Word embeddings support this: the model represents tokens as multi-dimensional vectors, and words with similar contextual meanings are positioned close to each other in that vector space, letting the model reason about meaning rather than just matching exact strings.

When to use it

The blueprint lists several model types under Task 2.1 — knowing when an LLM is the right choice (versus a different foundation model type) is a testable decision.

Model type	Primary modality	Typical use cases	When NOT the right pick
LLM (transformer-based)	Text in / text out	Summarization, Q&A, translation, code generation, chatbots, customer service agents	When the task is image generation, audio synthesis, or video generation
Multi-modal model	Text + images (or other combinations)	Image captioning, visual Q&A, document understanding	When the task is purely text and a lighter model suffices
Diffusion model	Noise → image/audio	Image generation, audio generation	When the task requires text reasoning or conversation

LLMs are suited to any task where the input and output are primarily text and where the model needs to generalize across many topics without task-specific retraining.

COMMON MISCONCEPTION

Misconception: LLMs "understand" or "know" things the way a person does.

What the official AWS documentation actually states is more precise: LLMs are models that predict the next token based on patterns learned during training. They are explicitly described as "not perfect" and "not infallible." The model adjusts parameter values to correctly predict tokens — it does not build a factual knowledge base that can be queried reliably. This is the source of hallucination: the model generates a plausible-sounding next token even when there is no grounded fact behind it.

A related trap: candidates sometimes assume that because an LLM was pre-trained on a large corpus, it has current or complete knowledge. Pre-training is a one-time process on a fixed dataset; the model does not update itself from new information after training unless fine-tuned or augmented (for example, via retrieval).

How it shows up on the exam

The blueprint places LLMs explicitly in Task 2.1 under "transformer-based LLMs" as one of the foundational generative AI concepts candidates must understand. The cognitive target is recognition and explanation — you are not expected to build or tune LLMs, but you are expected to distinguish them from other model types and explain their key properties.

A common misconception the exam exploits is treating LLMs and foundation models as synonyms. Foundation models are the broader category — an LLM is a specific type of foundation model focused on language. Candidates who conflate the two may misidentify which model type is appropriate for a given modality.

Signal phrases to watch for in questions: "pre-trained on vast amounts of data," "next-token prediction," "context window," "transformer," "parameters," "zero-shot," "few-shot," and "fine-tuning." When a question uses these phrases, it is testing LLM fundamentals.

Questions about capabilities (summarization, translation, code generation, chatbots) align with what the official AWS documentation lists as LLM use cases. Questions about limitations (hallucination, nondeterminism, inaccuracy) align with what the blueprint lists under Task 2.2 but are rooted in the same architectural facts — LLMs predict tokens, they do not retrieve verified facts.

Related concepts

Foundation models — the broader family that LLMs belong to; understanding the relationship prevents conflating the two on the exam.
Generative AI — the paradigm within which LLMs operate; LLMs are the most prominent generative AI model type for text.
Embeddings — the vector representations that LLMs use internally to capture contextual meaning; a testable concept that appears alongside LLMs in Task 2.1.

Large language models — AIF-C01