Q1 of 38 · CI/CD & DevOps

What is a flaky test and how do you handle it in CI?

CI/CD & DevOpsJuniorci-cdflaky-teststest-stabilityprocess

Short answer

Short answer: A flaky test passes and fails on the same code without changes. Quarantine it (don't auto-retry), mark it owned, fix the root cause within a deadline, and delete or rewrite it if you can't.

Detail

A flaky test is one whose outcome isn't determined by the code under test — it sometimes passes and sometimes fails for the same SHA. Common causes: hard-coded timing or sleeps, shared mutable state between tests, network or third-party dependencies, race conditions in async code, time-zone or date assumptions.

The wrong response is to add an auto-retry that masks flakiness — the test still flaps, the bug it might be catching gets ignored, and trust in CI erodes. The right response is quarantine and triage:

  1. Detect: track flake rate per test in CI metadata. A test that fails ≥ 1% of runs on the main branch is flaky by definition.
  2. Quarantine: move flaky tests to a separate job that's allowed to fail without blocking PRs, but is tracked. Don't delete them yet — they might be catching real bugs.
  3. Own: assign every quarantined test to a person or team with a deadline (e.g. 2 weeks).
  4. Fix or delete: if the flake is a test bug, fix it (replace sleeps with explicit waits, isolate state, mock the network). If the flake is a real product bug surfacing intermittently, fix the product. If neither — the test is testing something genuinely unstable — delete it; flake without intent is worse than no test.

Auto-retry is acceptable only as a temporary mitigation while a fix is in flight, never as the permanent answer.

// MODEL ANSWER

A flaky test is one that passes and fails on the same code without any change — its outcome is not determined by what you are testing. Common causes include hard-coded sleeps, shared mutable state between tests, real network or third-party dependencies, async race conditions, and date or timezone assumptions. The wrong response is to add an auto-retry. That hides the flakiness, erodes trust in the pipeline, and means a genuine bug the test was catching could be silently retried into a passing state. The right response is quarantine and triage. First, detect: track failure rates per test in CI metadata — anything failing one percent or more of runs on the main branch is flaky by definition. Second, quarantine: move it to a separate non-blocking job that does not stop PRs but is still tracked and visible. Third, assign ownership with a deadline, typically two weeks. Fourth, fix or delete. If it is a test bug, fix it — replace sleeps with explicit waits, mock the external dependency, isolate state between tests. If it is surfacing a real intermittent product bug, fix the product. If neither is tractable, delete it. A test that cannot be trusted is worse than no test.

// WHAT INTERVIEWERS LOOK FOR

A non-blame-y understanding of why flakes happen, awareness that auto-retry is a smell, and a concrete process (detect → quarantine → fix or delete).

// COMMON PITFALL

Suggesting 'just retry the test 3 times' as the solution. Interviewers worry that a candidate who says this will hide real bugs.