Q19 of 22 · Scenarios

How would you test a feature that integrates a third-party service that's often down?

ScenariosSeniorscenariothird-partyintegrationresiliencecircuit-breakersenior

Short answer

Short answer: Clarify the expected behavior on third-party unavailability, whether a sandbox or mock exists, and what observability is in place. Then test the happy path, all failure modes (5xx, timeout, circuit breaker), fallback, and recovery.

Detail

Clarify first

  • What is the expected behavior when the third party is unavailable — fail fast, fallback to cached data, or queue and retry?
  • Is there a sandbox or mock endpoint provided by the third party?
  • Does the integration have a circuit breaker or timeout configured, and what are the values?
  • What alerting or observability is in place for third-party failures?

Functional (when healthy)

  • End-to-end integration works correctly when the third party responds normally
  • Authentication to the third party works, including token rotation and key refresh
  • Data flows correctly in both directions; field mapping is accurate
  • Webhooks or callbacks from the third party are received and processed correctly

Failure modes

  • Third party returns 5xx → does the application degrade gracefully or crash? Is the user shown a meaningful message?
  • Third party times out → is there a configured timeout, or does the request hang indefinitely?
  • Circuit breaker: after N consecutive failures, does the circuit open and stop hammering the third party?
  • Third party returns an unexpected error code or malformed response → application handles it safely

Fallback & queue

  • If a fallback exists (cached data, queued retry), does it activate when the third party fails?
  • Is the user given a meaningful message instead of a generic error?
  • Are queued messages processed in the correct order when the third party recovers?
  • No double-processing: if a queued event is retried, is it idempotent?

Recovery

  • When the third party becomes available again, does the integration resume automatically?
  • Does the circuit breaker close after the third party stabilises (half-open probe requests)?

Observability

  • Are third-party errors distinguishable from first-party errors in logs and APM?
  • Does a third-party outage trigger an alert before users are affected?

Close: automate using WireMock or similar to simulate 5xx, timeout, and malformed response scenarios. Keep manual for the full observability check — verify that alerts fire and dashboards show the right signals during a simulated outage.

// WHAT INTERVIEWERS LOOK FOR

Circuit breaker testing, idempotency in the retry queue, and observability verification. Using a mock or WireMock to simulate outages (not waiting for the real third party to go down) is the practical insight.

// COMMON PITFALL

Only testing the happy path because 'the third party is down right now.' The answer should be: simulate failure with a mock; never depend on the real service being down to test error handling.