CutScore | Vector databases for retrieval

WHAT IT IS

A vector database is a purpose-built data store that saves data as high-dimensional numerical vectors (embeddings) and enables fast lookup of the nearest neighbors in that high-dimensional space. In a retrieval-augmented generation (RAG) system, it serves as the external knowledge layer: an embedding model converts your source documents into vectors, those vectors are stored and indexed in the vector database, and at query time the user's question is converted to a vector and matched against the index to surface the most semantically relevant passages.

Mental model

Think of a library where every book has been distilled into a point on a giant map. Books about similar topics cluster together. When you ask a question, the system plots your question on that same map and hands you the books whose points are closest — regardless of whether they share any keywords with your question. That "find the closest points" operation is what a vector database does.

When to use it

The exam tests whether you can distinguish vector-database-backed RAG from fine-tuning as approaches to grounding a foundation model in domain-specific knowledge.

Dimension	RAG with a vector database	Fine-tuning
Knowledge update	Add or replace documents in the vector database; no model retraining required	Requires a new training run to incorporate new information
Data that changes	Well-suited for frequently updated content	Less suited; model weights encode a fixed snapshot
Data stays external	Source documents remain outside the model	Knowledge is baked into model weights
Cost tradeoff	Storage and retrieval compute at inference time	Upfront training compute; lower per-query overhead once deployed
Reduces hallucinations	Grounded responses cite retrieved passages	Does not inherently provide per-query grounding

The blueprint also lists AWS services for storing embeddings: Amazon OpenSearch Service, Amazon Aurora, Amazon Neptune, Amazon DocumentDB (with MongoDB compatibility), and Amazon RDS for PostgreSQL.

COMMON MISCONCEPTION

Vector databases do not perform keyword (lexical) search — they perform similarity search over embeddings.

A traditional relational or keyword-search database matches exact or fuzzy text strings. A vector database converts both the stored content and the incoming query into numerical vectors and then uses a distance function — such as cosine similarity — to rank results by semantic closeness, not literal word overlap. This means two passages that share no words in common can still rank as highly relevant if their embeddings are nearby in vector space.

Candidates sometimes assume that any database holding text can serve as a RAG retrieval layer. The distinguishing requirement is that the store must support k-nearest-neighbor (k-NN) vector search. Some AWS services (for example, Amazon OpenSearch Service) support hybrid search on both keywords and vectors, but the vector search capability is what makes them suitable as a RAG retrieval backend.

A second misconception is treating RAG and fine-tuning as equivalent approaches to the same problem. RAG uses a vector database to supply the model with retrieved context at inference time; fine-tuning changes the model's weights. The blueprint explicitly lists both as distinct customization approaches with different cost tradeoffs.

How it shows up on the exam

Task 3.1 asks candidates to define RAG, describe its business applications, and identify AWS services that store embeddings in vector databases. Questions on this topic tend to test whether you understand the role of the vector database within the RAG pipeline, not low-level algorithmic implementation details.

Candidates often confuse the following:

Semantic search vs. keyword search: a scenario may describe a case where exact-match search fails on paraphrased queries and ask which component resolves this — the answer points to embedding-based similarity search, which is the function a vector database provides.
RAG vs. fine-tuning: a scenario describing a need to keep an LLM current with rapidly changing internal documents points toward RAG (update the vector database) rather than frequent fine-tuning runs.
Where embeddings live: the embedding model creates vectors; the vector database stores and indexes them. These are separate components in the pipeline.

Signal phrases to watch for: "semantic search," "embeddings," "similarity search," "knowledge base," "retrieval," "external data source," "reduce hallucinations," and "store vectors."

Vector databases for retrieval — AIF-C01

WHAT IT IS

Mental model

When to use it

COMMON MISCONCEPTION

How it shows up on the exam

Related concepts

Sources