Resilience Testing
// Definition
Testing that a system degrades gracefully and recovers correctly under adverse conditions: slow networks, service timeouts, partial failures, high load, and dependency outages. Covers retry and timeout verification, circuit-breaker triggering, failover, and recovery-after-crash scenarios. Broader than chaos engineering (which targets production-level fault injection) — resilience testing is conducted in controlled test environments.
// Related terms
Retry Pattern
An application-level strategy for automatically re-issuing a failed HTTP request or operation, using a backoff delay between attempts to avoid overwhelming a recovering service. A retry policy defines: maximum retry count, delay strategy (fixed, linear, or exponential backoff), optional jitter, per-attempt timeout, and total deadline. Retries must only be applied to idempotent operations — retrying a non-idempotent request (such as a payment) can cause duplicate actions.
Timeout
A maximum duration allowed for an operation to complete before it is considered failed. In API and network testing: connection timeout (time to establish the TCP connection), read timeout (time to receive the full response after connecting), and total deadline (aggregate across all retry attempts). A timed-out request differs from a failed request — the status code, error type, and retry behaviour differ and must each be tested explicitly.
Exponential Backoff
A retry delay strategy where each successive attempt waits twice as long as the previous one: delay = base × 2^(attempt−1). Attempt 1 waits base ms, attempt 2 waits 2×base, attempt 3 waits 4×base. Prevents thundering-herd problems by spreading out retry load on a recovering service. Often combined with jitter (a random offset within the delay range) to avoid synchronised retry storms from multiple clients.
Chaos Engineering
Deliberately injecting failures (killing instances, adding latency, dropping packets) into production or production-like environments to verify resilience. Pioneered by Netflix's Chaos Monkey. Uncovers brittleness that synthetic tests can't reproduce.