High availability — SY0-701
Learn what high availability (HA) means in security architecture, how it differs from fault tolerance and resilience, and how it tests on SY0-701.
WHAT IT IS
High availability (HA) is "a failover feature to ensure availability during device or component interruptions." (NIST SP 800-113, via NIST Glossary)
Availability itself is "ensuring timely and reliable access to and use of information." (FIPS 200, derived from 44 U.S.C. § 3542, via NIST Glossary)
HA achieves that goal specifically through failover — the capability to switch over automatically (typically without human intervention or warning) to a redundant or standby system upon the failure or abnormal termination of the previously active system. (CNSSI 4009-2015 / NIST SP 800-53 Rev. 5, via NIST Glossary)
Mental model
Think of HA as a relay race handoff that the runners execute automatically the moment the lead runner trips. The baton (the service) keeps moving without the crowd (the users) noticing a drop. The relay team exists precisely so no single runner can stop the race.
The key properties that make the relay work:
- Redundancy — a standby runner is always staged and ready.
- Automatic failover — the handoff happens without a coach calling a timeout.
- Continued service — the race does not pause; authorized users retain access.
When to use it
The exam frequently places HA next to two concepts that sound similar but operate at a different scope or timing. Use this table to keep them distinct.
| Concept | NIST-grounded definition (summary) | Primary goal | When service resumes |
|---|---|---|---|
| High Availability | Failover feature to ensure availability during device or component interruptions (NIST SP 800-113) | Prevent perceivable service interruption | Automatically, during the interruption |
| Fault Tolerance | A property of a system that allows proper operation even if components fail (NISTIR 8202) | Continue correct operation through component failure | Continuously — no transition needed |
| Resilience | Ability to operate under adverse conditions and recover to an effective operational posture in a time frame consistent with mission needs (NIST SP 800-39) | Maintain essential capability and recover | After degradation — may involve partial recovery |
The practical distinction: fault tolerance means the system never stops working even as parts fail; high availability means the system switches to a standby so quickly that service is effectively uninterrupted; resilience is the broader property of surviving and recovering from adversity, which may include a period of degraded operation.
COMMON MISCONCEPTION
Candidates often treat high availability and fault tolerance as synonyms because both involve redundancy. They are not the same.
Fault tolerance is grounded in the property of a system allowing proper operation even if components fail — the system keeps running through failure, internally. High availability is grounded in the failover mechanism — a separate standby takes over upon failure or abnormal termination. A fault-tolerant system does not require a switchover; a high-availability system does. Confusing the two can lead a candidate to select the wrong architectural control when a question asks which mechanism specifically uses a standby and an automatic switchover.
How it shows up on the exam
The cognitive target is application: given a described scenario, identify which mechanism — HA, fault tolerance, or resilience — is the appropriate control or the one already in use.
Signal phrases to watch for:
- "automatically switches to a standby" → points to HA (failover, NIST SP 800-113 / CNSSI 4009-2015)
- "continues to operate even as components fail" → points to fault tolerance (NISTIR 8202)
- "recover to an effective operational posture" or "operate under adverse conditions" → points to resilience (NIST SP 800-39)
Candidates sometimes answer from intuition ("redundancy = high availability") without reading whether the scenario describes an automatic switchover or continuous operation through failure. Slowing down to identify which grounded property the scenario describes — failover, continued operation, or recovery — is the reliable path through these items.
Related concepts
- Recovery Sites — the physical or cloud locations a failover switches to
- Geographic Dispersion — distributing components across locations to reduce single-point-of-failure risk
- Backups — data copies that support recovery but are distinct from the automatic-switchover mechanism of HA
Sources
Every claim on this page traces to the public exam blueprint and official documentation: