- high availability: guaranteed high uptime percentage
- example implementation: keeping standby back-up servers
- disruption is allowed (it’s okay if users have to re-login) as long as uptime is maintained
- fault tolerance: HA plus the ability of system to keep working after encountering issues or failures
- stricter than high availability: disruption is NOT allowed (e.g. anesthetic machine)
- example implementation: same session data is broadcasted redundantly across identical systems, so that when one system fails, another can take over immediately without causing disruption
- disaster recovery: procedures and methodologies in place to handle major disasters (what to do when HA and FT both failed)
- pre-planning
- e.g. separate stand-by premises, separate archived off-site backups, make sure people can always access login credentials in emergency times
- periodic disaster recovery testing needed