A flaky test is a test that produces different results on identical code. It passes on your laptop, fails in CI, passes on rerun, fails on the next push. Flakiness is the single most corrosive problem in any test suite — once developers stop trusting the suite, they start ignoring failures, and at that point the suite is doing harm rather than good. This lesson covers the why (the six causes of flakiness in Playwright tests), the what (Playwright's built-in mitigations like auto-waiting and retries), and the how (a workflow for finding flakes, diagnosing root causes, and fixing them rather than papering over with reruns).
The six causes of flakiness
Every flaky test has one of these underneath:
- Timing — an action runs before the page is ready. Playwright's auto-waiting eliminates most of these, but custom selectors, manual
evaluatecalls, and race conditions between two parallel actions still bite. - Animations — an element is moving when you click it. A button transitioning from x=100 to x=0 may receive your click at the wrong x-coordinate, or a modal that's fading in may not have its event handlers attached yet.
- Test data collisions — two parallel tests modify the same record. Test A creates a user
test@example.com, Test B (running on a different worker) tries to create the same user, gets "email already exists," fails. - External dependencies — a third-party API is slow or temporarily down. Even with mocks, if you forget to mock a single call, you're at the mercy of the real service.
- Non-deterministic data — timestamps, random IDs, A/B test buckets, feature flags that flip per session. The same test, same code, sees a different page each run.
- Network race conditions — request A and request B fire in different orders on different runs. Your test assumes A's response renders before B's, and on a fast network it does — until it doesn't.
The fix for each cause is different. Treating flakes as a single problem ("just retry it") is why teams' suites get worse over time, not better.
Playwright's built-in mitigations
A lot of work has gone into making Playwright resilient by default. Three layers do most of the heavy lifting:
Auto-waiting. click, fill, check, etc. don't fire until the element is attached, visible, stable (not animating), receives events, and is enabled. The framework retries the actionability check internally until the timeout. This eliminates the entire class of "click happened before the button rendered" flakes.
Web-first assertions. expect(locator).toBeVisible(), toHaveText, toHaveCount poll until they pass or hit the timeout. You don't need to add a wait before the assertion — the assertion is the wait.
Retries. A failed test can be retried automatically:
// playwright.config.ts
import { defineConfig } from "@playwright/test";
export default defineConfig({
retries: process.env.CI ? 2 : 0
});retries: 2 reruns failed tests up to twice before declaring them failed. The HTML report tags retried-but-passing tests as flaky — distinct from green tests, distinct from red tests. That tag is your queue of "tests I need to fix next."
The strategic position on retries: ship them in CI to absorb truly transient infra hiccups, but treat every flaky-tagged test as a bug to investigate. Retries are a safety net, not a fix.
Identifying flakes — --repeat-each
The fastest way to catch a flake is to run the same test many times:
npx playwright test login.spec.ts --repeat-each=20This runs every test in login.spec.ts 20 times. If the test is flaky, you'll see failures interspersed with successes. Combine with --workers=1 to rule out parallelism as the cause:
npx playwright test login.spec.ts --repeat-each=20 --workers=1If the test fails serial, the cause is timing or non-determinism inside the test. If it only fails parallel, the cause is data collision.
For a whole suite, run --repeat-each=3 once a week and watch the flaky-tagged list. Tests that show up repeatedly are your priority.
Fixing the six causes
Timing. Stop using waitForTimeout and start using domain-specific waits:
await page.waitForResponse(resp => resp.url().includes("/api/orders") && resp.status() === 200);
await expect(page.getByTestId("order-list")).toBeVisible();Wait for the thing that has to happen before your action, not for an arbitrary time.
Animations. Disable them in test mode. Either via a CSS injection or via the reduced-motion media query the page should respect:
test.beforeEach(async ({ page }) => {
await page.emulateMedia({ reducedMotion: "reduce" });
await page.addStyleTag({ content: "*, *::before, *::after { animation-duration: 0s !important; transition-duration: 0s !important; }" });
});This kills CSS animations and transitions so elements snap to final state instantly. Tests get faster and less flaky.
Test data collisions. Generate unique data per run, and clean up after each test:
import { test, expect } from "@playwright/test";
test("user can register", async ({ page }) => {
const email = `test-${Date.now()}-${Math.random().toString(36).slice(2)}@test.com`;
await page.goto("/register");
await page.getByLabel("Email").fill(email);
// ...
});For data the test must read after creating, store it in a worker-scoped fixture so each worker has its own slice.
External dependencies. Mock at the network layer. Anything you don't mock is a flake risk:
await page.route("**/api/external/**", route => route.fulfill({ status: 200, body: "{}" }));If a test depends on a real third-party (e.g., a payment gateway sandbox), accept the flake risk and tag it @external so it's run separately.
Non-deterministic data. Pin timestamps in the page:
await page.addInitScript(() => {
Date.now = () => new Date("2026-05-07T10:00:00Z").getTime();
});Or stub the API responses that contain dynamic data so the test always sees a frozen value.
Network race conditions. Wait for both requests before asserting:
await Promise.all([
page.waitForResponse("**/api/cart"),
page.waitForResponse("**/api/recommendations"),
page.click("text=Add to Cart")
]);
await expect(page.getByText("Added")).toBeVisible();Promise.all makes the wait deterministic — both responses, then the assertion.
Quarantining flakes
When a test is known-flaky and you don't have time to fix it today, quarantine instead of disabling:
import { test } from "@playwright/test";
test("checkout edge case @flaky", async ({ page }) => {
// ...
});Then in CI, run the main suite excluding flakies, and run the flakies separately so they don't block builds:
npx playwright test --grep-invert "@flaky"
npx playwright test --grep "@flaky" --retries=3Track the count of @flaky-tagged tests over time. If it's growing, you're accumulating debt; if it's shrinking, your team is paying it down.
The flaky test lifecycle
Monitoring flake rate
Two metrics are worth tracking on every CI run:
- Flaky test count — number of tests marked flaky (passed only after retry) in this run.
- Flaky test rate — flaky count / total count, expressed as a percentage.
Anything above 1-2% means the suite is degrading. Set a threshold (e.g., "block merges if flaky rate > 3%") and the team will keep flakes in check. The HTML report exposes the flaky count; pipe it to a dashboard via the JSON reporter or the onTestEnd hook of a custom reporter.
A real example, walked through
A "checkout completes" test fails roughly 1-in-10 CI runs. Always passes locally.
--repeat-each=20 --workers=1locally → 20 passes. Suspect parallelism.--repeat-each=20 --workers=4locally → 2 failures.- Open the trace from a failure: the test creates a user, then logs in, then checks out.
- Network tab on the failing run shows
/api/users/createreturned 409 Conflict — the user already exists. - Root cause: another worker created the same user. Email was hard-coded
test@test.com. - Fix: set
emailto a worker-unique template literal —`test-${test.info().workerIndex}-${Date.now()}@test.com`. Now each worker uses its own email. - Confirm with
--repeat-each=50 --workers=4→ 50 passes. Remove@flakytag.
Total time: 20 minutes. No retries needed — the test is now actually deterministic.
Coming from Cypress?
Cypress's flakiness story is similar in spirit:
- Cypress retries:
retries: { runMode: 2, openMode: 0 }→ Playwright'sretries: 2. - Cypress's
Cypress.timesfor repeat → Playwright's--repeat-each. - Cypress automatic wait for
cy.get(4-second default) → Playwright's auto-waiting (30-second default, more comprehensive checks). - The biggest difference: Cypress's retries-on-flaky reruns the failed test only; Playwright reruns and tags it
flakyso you can find it later. The latter is what makes "fix the worst 5 flakes per week" a tractable workflow.
⚠️ Common mistakes
- Treating flakiness as a property of the framework. "Playwright is flaky" is almost never true; the test is flaky because of timing, data, or environment. Frame it as "this test has a race condition," not "the framework is unreliable" — the former is fixable, the latter induces helplessness.
- Bumping retries to make red CI go green.
retries: 5hides the problem and trains the team to ignore signal. Cap at 2 and treat every flaky-tagged test as a bug. - Mocking only the failing endpoint. A test that depends on three external services and mocks only the one that broke last week will flake the moment a different service blips. Mock the whole boundary, not the symptom.
🎯 Practice task
Find and fix a real flake. 30-40 minutes.
-
Write a deliberately-flaky test:
import { test, expect } from "@playwright/test"; test("flaky add to cart", async ({ page }) => { await page.goto("https://www.saucedemo.com"); await page.getByPlaceholder("Username").fill("standard_user"); await page.getByPlaceholder("Password").fill("secret_sauce"); await page.getByRole("button", { name: "Login" }).click(); // Race condition: click before the button is fully rendered await page.evaluate(() => { const btn = document.querySelector("button.btn_inventory") as HTMLElement | null; btn?.click(); }); await expect(page.locator(".shopping_cart_badge")).toHaveText("1"); }); -
Configure retries and run repeated:
import { defineConfig } from "@playwright/test"; export default defineConfig({ testDir: "./tests", retries: 2, reporter: [["html", { open: "never" }]], use: { trace: "on-first-retry" } });npx playwright test flaky.spec.ts --repeat-each=20 -
Confirm the test fails sometimes. Open a failing trace and identify the cause: the
evaluateclick fires before the page has the inventory rendered. -
Fix it the right way — replace the
evaluatehack with a Playwright locator and assertion:await expect(page.getByRole("button", { name: "Add to cart" }).first()).toBeVisible(); await page.getByRole("button", { name: "Add to cart" }).first().click(); await expect(page.locator(".shopping_cart_badge")).toHaveText("1"); -
Rerun
--repeat-each=20— should now pass 20/20. -
Stretch: add a
@flakytag pattern and a CI workflow with two jobs — one runs the main suite excluding flakies, one runs only flakies with--retries=3.
Flakiness is the line between a suite that helps and a suite that hurts. With auto-waiting, retries, the trace viewer, and the --repeat-each workflow, you have everything you need to keep the line on the helpful side. The next chapter is the capstone — a complete Playwright suite for a banking application that puts every concept from this course into one production-grade project.