Q32 of 37 · Selenium
How do you reduce flakiness in a 500+ test Selenium suite?
Short answer
Short answer: Measure first (per-test rerun rate), categorise by cause (waits, data, environment, app bugs), then fix in priority: explicit waits everywhere, test-data isolation per test, retry only as a last resort. Quarantine the worst 5% so they don't tank trust while you fix them.
Detail
Flake at 500+ tests is rarely one cause — it's a long tail. The mistake teams make is reaching for retry-on-failure as a fix; that hides flakes rather than removing them, and slowly accumulates until the suite is meaningless.
Step 1 — measure. Wire each test's pass/fail into a database (or use Allure history, or your CI's test result store). For each test, compute: pass rate, median runtime, p95 runtime, rerun-required count. The top 5% of flaky tests usually account for 80% of suite failures.
Step 2 — categorise. For the top-20 flakiest, classify by cause:
- Waiting — implicit waits, missing explicit waits, sleeps, race after click.
- Data — tests share users, tenants, or seed data.
- Environment — CI runner load, network jitter, container memory.
- App bugs — real timing issues that should fail (rare but real).
Step 3 — fix in priority:
Standardise on explicit waits. Remove
Thread.sleepandimplicitlyWait. A reusable wait helper (Waits.untilClickable(by)) makes the pattern uniform.Isolate test data. Each test creates its own user / tenant / records via API calls in
@BeforeMethod. Cleanup via@AfterMethodor background reaper. Shared seed data is the #1 cause of mid-flake at scale.Stable selectors. Replace generated-id locators with
data-testattributes. A coordinated push with the dev team — not a one-time fix.CI-resource sizing. If runners are at 90% memory, browsers thrash. Bump runner specs or reduce parallel count.
Quarantine the worst 5%. Move them to a "flaky" group that doesn't gate the build. Track them as a debt list with weekly reviews. Better than retrying or ignoring.
Step 4 — set a budget. "We will accept a 0.5% per-test flake rate; anything above gets quarantined." Codify this and stick to it. Without a number, flake quietly grows back.
Step 5 — retry as a last resort, scoped narrowly. TestNG IRetryAnalyzer retrying once is acceptable for known-flaky network-dependent tests; retrying 3 times for everything is the failure mode that kills suite trust.
The cultural piece: publish the flake report weekly. Visibility is what keeps it from drifting back up.