Computer vision use cases — AIF-C01
Learn computer vision use cases for AWS AIF-C01: core task types, real-world applications, and the key misconception that trips exam candidates.
What it is
Computer vision is a technology that enables machines to automatically recognize images and describe them accurately and efficiently. It uses machine learning — specifically deep learning — to train computers on large volumes of visual data, so they can identify patterns and apply that learned knowledge to recognize new, unseen images and video.
The core insight is that computer vision does not change or alter an image; it makes sense of what it sees and carries out a task, such as labeling an object or raising an alert.
Mental model
Think of computer vision as giving a machine a pair of eyes and a trained brain. The eyes are sensors (cameras, scanners, medical imaging devices). The brain is a deep learning model — typically a convolutional neural network (CNN) — that has learned from millions of labeled examples what different visual patterns mean. When new visual input arrives, the model matches it against what it has learned and produces an output: a label, a bounding box, a classification, or a decision.
This is distinct from image processing, which filters or transforms pixels (sharpening a photo, adjusting contrast). Computer vision leaves the pixels alone and produces meaning.
When to use it
Use the table below when an exam question describes a scenario and you need to choose the right AI/ML approach.
| Scenario | Right fit | Why |
|---|---|---|
| A factory camera flags defective products on a production line | Computer vision | Detecting and localizing visual defects in images is an object detection task |
| A hospital system interprets X-rays and MRIs | Computer vision | Analyzing medical images for anomalies (e.g., tumor detection from moles or lesions) is image classification/segmentation |
| A self-driving system identifies pedestrians and road signs in real time | Computer vision | Real-time image recognition and 3D map construction from camera feeds |
| A security system restricts access to a server room | Computer vision | Facial recognition for employee authentication is a computer vision task |
| A voice assistant transcribes spoken commands | Not computer vision | This is a speech/NLP task — there is no visual input |
| A recommendation engine predicts which product a customer will buy | Not computer vision | This is tabular/structured-data ML — no image or video input |
Common misconception
The trap: candidates often assume that any "image" task is a computer vision task — for example, that resizing, compressing, or color-correcting a photo is computer vision. It is not. Those are image processing operations that transform pixels without interpreting them.
Computer vision is specifically about making sense of visual content: identifying what is in an image, where objects are located, or what is happening across a sequence of frames. The distinction the official documentation draws is sharp: computer vision does not change an image; it labels, detects, tracks, or segments what it sees.
A second misconception is that computer vision and object detection are the same thing. Object detection — identifying and localizing objects within an image — is one task within computer vision. Computer vision also encompasses image classification (assigning a category label to a whole image), object tracking (following an identified object across video frames), and image segmentation (dividing an image into pixel regions that correspond to distinct objects or areas).
How it shows up on the exam
The exam task is recall and application: given a business scenario, identify whether it calls for computer vision and, if so, which specific CV task type (classification, detection, tracking, segmentation) fits the description.
Candidates often confuse computer vision with image processing (transforming pixels vs. interpreting them) or with NLP tasks when a scenario involves reading text from a document image. A scenario describing "reading text from a scanned invoice" could involve optical character recognition, which sits at the intersection of computer vision and NLP — the visual component is CV; the text understanding that follows is NLP.
Signal phrases in stems that point toward computer vision:
- "camera feed," "image," "video," "scan," "X-ray," "visual inspection"
- "detect," "identify," "recognize," "locate," "classify" applied to objects in images
- "autonomous vehicle," "medical imaging," "quality defect," "facial recognition," "crop disease"
Signal phrases that point away from computer vision (toward other AI/ML domains):
- "predict a numeric value," "classify customer reviews," "transcribe audio," "recommend products"
Related concepts
Sources
Every claim on this page traces to the public exam blueprint and official documentation: