Testing Circuit Breakers, Retries, and Timeouts

9 min read

A circuit breaker that's never been tested is a circuit breaker you don't have. The configuration compiles, the Resilience4j dependency is on the classpath, but whether the circuit actually opens at the right threshold, blocks calls correctly, and closes after recovery — none of that is verified until a production incident. This lesson shows how to write tests that prove each resilience pattern works exactly as configured.

Testing retries

Retries handle transient failures — a service temporarily returns 503 due to a rolling restart. Use WireMock scenario state to simulate a service that fails twice then succeeds:

@Test
void shouldRetryOnTransientFailureAndSucceedOnThirdAttempt() {
    wm.stubFor(get(urlEqualTo("/users/42"))
        .inScenario("transient-failure")
        .whenScenarioStateIs(Scenario.STARTED)
        .willReturn(serverError())
        .willSetStateTo("attempt-2"));
 
    wm.stubFor(get(urlEqualTo("/users/42"))
        .inScenario("transient-failure")
        .whenScenarioStateIs("attempt-2")
        .willReturn(serverError())
        .willSetStateTo("attempt-3"));
 
    wm.stubFor(get(urlEqualTo("/users/42"))
        .inScenario("transient-failure")
        .whenScenarioStateIs("attempt-3")
        .willReturn(okJson("{\"id\":42,\"name\":\"Alice\"}")));
 
    User user = userServiceClient.getUser(42L);
 
    assertThat(user.getName()).isEqualTo("Alice");
    // Verify exactly 3 attempts were made
    wm.verify(3, getRequestedFor(urlEqualTo("/users/42")));
}

Also test the failure case — all retries exhausted:

@Test
void shouldThrowAfterExhaustingAllRetries() {
    // Always returns 503
    wm.stubFor(get("/users/42").willReturn(serverError()));
 
    assertThatThrownBy(() -> userServiceClient.getUser(42L))
        .isInstanceOf(ServiceUnavailableException.class);
 
    // Verify it tried the configured number of times (3) and gave up
    wm.verify(3, getRequestedFor(urlEqualTo("/users/42")));
}

Testing timeouts

A timeout test verifies that your service doesn't hang waiting for a slow dependency:

@Test
void shouldAbortCallThatExceedsTimeoutThreshold() {
    // Simulate a dependency that takes 10 seconds to respond
    wm.stubFor(get("/users/42")
        .willReturn(okJson("{\"id\":42}").withFixedDelay(10_000)));
 
    // Service is configured with a 3-second timeout
    long start = System.currentTimeMillis();
 
    assertThatThrownBy(() -> userServiceClient.getUser(42L))
        .isInstanceOf(TimeoutException.class);
 
    long elapsed = System.currentTimeMillis() - start;
    // Verify the timeout fired within expected window (3s ± 500ms)
    assertThat(elapsed).isLessThan(4_000L);
}

Key point: verify the elapsed time. If your timeout isn't configured, the test will hang for the full 10 seconds and eventually pass the assertThrownBy — but the elapsed time assertion will fail, catching the misconfiguration.

Testing circuit breakers with Resilience4j

The circuit breaker test needs to verify three states:

@SpringBootTest
class CircuitBreakerTest {
 
    @Autowired
    private CircuitBreakerRegistry registry;
 
    @Test
    void shouldTransitionThroughCircuitBreakerStates() throws InterruptedException {
        CircuitBreaker cb = registry.circuitBreaker("user-service");
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.CLOSED);
 
        // Configure WireMock to always return 503
        wm.stubFor(get("/users/42").willReturn(serverError()));
 
        // Trigger enough failures to open the circuit
        // (configured: open after 5 failures in 10-call sliding window)
        for (int i = 0; i < 10; i++) {
            try { userServiceClient.getUser(42L); } catch (Exception ignored) {}
        }
 
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN);
 
        // Now verify calls fail fast — circuit is open, no requests reach WireMock
        long start = System.currentTimeMillis();
        assertThatThrownBy(() -> userServiceClient.getUser(42L))
            .isInstanceOf(CallNotPermittedException.class);
        assertThat(System.currentTimeMillis() - start).isLessThan(50L);
 
        // Wait for half-open transition (configured: 10 seconds)
        Thread.sleep(11_000L);
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.HALF_OPEN);
 
        // Allow one successful call to close the circuit
        wm.stubFor(get("/users/42").willReturn(okJson("{\"id\":42,\"name\":\"Alice\"}")));
        userServiceClient.getUser(42L);
        assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.CLOSED);
    }
}

Inject Resilience4j's CircuitBreakerRegistry to inspect state directly. The test covers all three state transitions: closed → open → half-open → closed. Each assertThat(cb.getState()) call is a checkpoint — if any transition doesn't happen, the test fails at exactly the right line.

Testing bulkheads

A bulkhead limits the number of concurrent calls to a dependency, preventing a slow downstream service from consuming all available threads:

@Test
void shouldRejectCallsWhenBulkheadIsFull() throws Exception {
    // Max concurrent calls = 2 (configured in application.yml)
    wm.stubFor(get("/users/1").willReturn(okJson("{}").withFixedDelay(2000)));
 
    ExecutorService executor = Executors.newFixedThreadPool(5);
    List<Future<?>> futures = IntStream.range(0, 5)
        .mapToObj(i -> executor.submit(() -> {
            try { return userServiceClient.getUser(1L); }
            catch (BulkheadFullException e) { return "rejected"; }
        }))
        .collect(Collectors.toList());
 
    long rejectedCount = futures.stream()
        .map(f -> { try { return f.get(); } catch (Exception e) { return "error"; }})
        .filter("rejected"::equals)
        .count();
 
    // 5 concurrent calls, max 2 allowed → at least 3 should be rejected
    assertThat(rejectedCount).isGreaterThanOrEqualTo(3);
}
Resilience Patterns
  • – Transient fault recovery
  • – Exponential backoff
  • – Max attempts gate
  • – Prevent thread starvation
  • – Fail fast, return error
  • – Per-dependency config
  • – Closed → Open → Half-open
  • – Failure rate threshold
  • – Automatic recovery
  • Concurrent call limit –
  • Isolates dependencies –
  • Prevents cascade –

⚠️ Common mistakes

  • Not verifying the call count after a retry test. If your retry is misconfigured to not retry at all, the test passes (the single call eventually returns the success stub) but the retry code is never exercised. Always use wm.verify(expectedCallCount, ...) to confirm the exact number of attempts.
  • Using Thread.sleep to wait for circuit breaker half-open state in tests. Thread.sleep(11_000L) makes your test suite 11 seconds slower. Instead, configure the circuit breaker's wait-duration-in-open-state to a short value in your test profile (e.g., 500ms), and use Awaitility to poll for the state change.
  • Testing resilience patterns with mocks instead of WireMock. A Mockito mock that throws an exception exercises the Java exception handling path, not the HTTP timeout path. Circuit breakers and timeouts operate at the HTTP transport level and must be tested with real HTTP calls — WireMock or Toxiproxy, not Mockito.

🎯 Practice task

  1. Write the retry scenario test shown above. Configure your service with max-attempts: 3 in application-test.yml. Run the test and verify it passes. Then change max-attempts: 1 and observe the test failing because it still expects three WireMock calls.
  2. Write the timeout test. Verify the elapsed time assertion. Remove the timeout configuration from application-test.yml. Observe the test hanging past 4 seconds — this is what your production service does when dependencies are slow and timeouts aren't set.
  3. Write the full circuit breaker state machine test. Inject the CircuitBreakerRegistry and assert on each state transition. Use a short wait-duration-in-open-state (500ms) in your test config to avoid a long sleep.
  4. Break your circuit breaker test by changing the configured failure-rate threshold from 50% to 90%. Run the test — does it catch the misconfiguration? If not, adjust your test to detect it.
  5. Look up Resilience4j's TimeLimiter module. Write a one-paragraph explanation of how it differs from a RestTemplate connection timeout, and under what circumstances you'd need both in the same service call.

The next lesson covers observability in test environments — how to bring logs, metrics, and traces into your test stack so that when a resilience test fails, you can see exactly what happened.

// tip to track lessons you complete and pick up where you left off across devices.