Maintaining and Scaling Your Test Suite

9 min read

A test suite is software. Like any software, it rots if nobody maintains it. Eighteen months into a project the same suite that started fast and trustworthy has slow specs nobody refactored, flaky tests everyone reruns reflexively, dead tests for features that shipped a year ago, and twenty-second login dances no one ever replaced with cy.session. This lesson covers the maintenance habits and scaling strategies that keep a Cypress suite from drifting — flake management, test independence, the speed knobs you've already learned, the tiered-testing model that lets one suite serve three CI cadences, and what to track when the project grows.

Test independence — the rule that prevents most flake

Every test must be able to run alone. No spec depends on the side effects of another spec. This is the single most important rule for a maintainable suite.

// ❌ Anti-pattern — auth-login.cy.ts creates a user; products-list.cy.ts assumes it exists
it("registers a new user", () => {
  cy.visit("/register");
  // ... creates user with email "alice@test.com"
});
 
// In a different spec
it("shows products to logged-in users", () => {
  cy.loginViaApi("alice@test.com", "Sup3rS3cret!"); // depends on previous spec
});

Serial: works. Parallel CI: fails randomly because products-list.cy.ts runs on a different worker, before auth-login.cy.ts has finished. Worse, the failure looks like a login bug and you spend an hour debugging the wrong layer.

The fix: every spec creates its own data via API or app-actions setup. Factories (lesson 3) make this cheap; chapter 4's cy.request patterns make it fast.

Dealing with flaky tests

Flake is the slow death of trust in a test suite. The four-step process most teams adopt:

  1. Identify. Cypress Cloud's flake detection (chapter 8) ranks flaky tests by frequency. Without Cloud, track manually — a spreadsheet of "tests that failed then passed on retry" works for small suites.
  2. Quarantine. Move the worst offenders into cypress/e2e/quarantine/ and exclude that folder from the main run via excludeSpecPattern. Quarantined tests still run on a nightly schedule so the team has signal, but they don't block PRs.
  3. Fix root causes. Almost every flake has one of three causes: a fixed wait (cy.wait(2000)), an unstable selector (CSS class), or a race condition (assertion runs before async work completes). Replace with cy.intercept + cy.wait("@alias"), swap to data-testid, add a should assertion before the dependent step.
  4. Prevent. Code review every new test for the three causes above. The flake stops landing in the first place.

The discipline that pays off: flakes are bugs, not noise. Treat them like production incidents — surface, triage, fix.

Keeping tests fast

Every speed lever you've learned in earlier chapters compounds. The suite that's fast on day one stays fast through year two only if the rules are applied consistently:

  • API login + cy.session (chapter 6) — saves 4 seconds per test.
  • cy.intercept to stub slow APIs (chapter 4) — turns a 2-second backend round-trip into a 50ms intercept.
  • App actions for state setup (chapter 5) — sub-second client-side state seeding.
  • Spec files in the 1-3 minute range (chapter 8) — anything longer becomes the parallelism bottleneck.
  • Parallel execution in CI (chapter 8) — 4 workers cut wall-clock by 4× until the longest spec sets the floor.
  • Zero cy.wait(<number>) calls (chapter 3) — every fixed wait is dead time.

A suite that does all six runs in a fraction of the time of one that does none. The rules don't enforce themselves — they go in the team's contributing guide and get checked at review time.

Code review for tests

Test code is code. Apply the same review process production code gets:

  • A test PR is a PR. It gets a reviewer; the reviewer reads it; comments are addressed before merge.
  • Lint and format. ESLint + the official eslint-plugin-cypress rules catch common mistakes (forbidden cy.wait(ms), missing cy.intercept on async ops, expect outside should).
  • Coverage by feature, not by line. Tracking "test coverage" as a percentage is mostly meaningless for E2E. Track coverage by user-facing flow — does every page have at least one test? Does every form get its happy path?

A team that reviews test PRs as carefully as production PRs ends up with a suite the team trusts. A team that lets test PRs through without review ends up with the suite they regret.

Tiered testing — one suite, three cadences

A 200-spec suite isn't viable on every push. The tiered model splits the same suite into three cadences:

  • Smoke (1-3 minutes) — 5-10 critical-path tests: login, primary checkout, signup, the most-used admin flow. Runs on every push.
  • Regression (5-15 minutes) — full feature coverage. Runs on PRs to main and nightly.
  • Full (30-60 minutes) — visual regression, accessibility audits, cross-browser runs, edge-case data sets. Runs nightly or weekly.
# .github/workflows/smoke.yml — every push
on: [push]
jobs:
  smoke:
    steps:
      - uses: cypress-io/github-action@v6
        with:
          spec: "cypress/e2e/smoke/**"
 
# .github/workflows/regression.yml — PRs and nightly
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 6 * * *"
jobs:
  regression:
    steps:
      - uses: cypress-io/github-action@v6
        with:
          spec: "cypress/e2e/!(visual|a11y|quarantine)/**"
 
# .github/workflows/full.yml — nightly only
on:
  schedule:
    - cron: "0 2 * * *"
jobs:
  full:
    steps:
      - uses: cypress-io/github-action@v6   # all specs, all browsers, all visual + a11y

Three workflows, three cadences. The fast feedback loop is fast; the deep coverage still happens; CI cost stays bounded.

When to delete tests

Tests are technical debt — every one of them costs CI time, maintenance attention, and review-cycle bandwidth. Delete when:

  • The feature is gone. Don't keep tests for a payment provider you migrated off six months ago.
  • The test always passes regardless of the app. Some tests are so over-stubbed they verify their own stubs. If a test would pass on a broken app, it's worse than no test.
  • The test catches the same bug as another, more focused test. Two tests that fail on identical bugs are one test too many.

Resist the urge to disable a flaky test (it.skip) instead of fixing or deleting it. Skipped tests rot — six months later nobody remembers why they're skipped, and the disabled test is now a visible reminder of the team's debt.

Updating tests when the app changes

When a developer changes the UI, tests break. This is normal — a test that doesn't break when the feature changes wasn't actually testing the feature. The cost of maintenance is what makes the structural choices in chapters 5 and 9 worth doing:

  • Selectors centralised in constants.tsdata-testid rename = single edit.
  • Page objects — UI restructure = single page-object update; specs unchanged.
  • Shared types in types.ts — schema change = compiler flags every affected test.
  • Factories — new required field = single factory edit; defaults flow to every test.

A test suite without these abstractions costs hours per refactor. With them, a major refactor is twenty minutes.

Metrics worth tracking

A small set of numbers tells you whether the suite is healthy or rotting:

The numbers are illustrative — the shape matters: track these six, plot them weekly, watch the lines. A growing flake count is the early signal of a suite that's losing trust; a growing quarantine folder is the late signal.

A growth playbook

A team's typical Cypress suite trajectory:

  • 0-20 specs. Flat folder; minimal abstractions; no CI parallelism. Optimise for speed of authoring.
  • 20-50 specs. Introduce folder structure (chapter 9, lesson 1) and shared types (lesson 2). CI runs in serial; suite takes 5-10 minutes.
  • 50-100 specs. Add factories (lesson 3), tiered testing (smoke + full), CI parallelism (chapter 8). Cypress Cloud or cypress-split for orchestration.
  • 100-200 specs. Quarantine folder, weekly flake review, dedicated test-engineer role for maintenance. Tiered testing splits into three cadences.
  • 200+ specs. Coverage tracked by feature ownership. Each feature team reviews their own folder's tests. Periodic suite audit to delete dead specs and refactor slow ones.

Each milestone introduces one or two new disciplines. Skipping milestones (200 specs without folder structure, 100 specs without parallelism) is what produces unmanageable suites. Adding the discipline at the right size keeps the work small.

⚠️ Common mistakes

  • Disabling flaky tests with it.skip instead of fixing or deleting them. Skipped tests are debt with no visible cost — six months later nobody remembers why they're off. Either fix the flake (the root cause is almost always one of three known patterns), quarantine it explicitly with a tracked ticket, or delete it.
  • Letting the test suite grow without ever pruning. The dashboard says "200 specs"; nobody mentions that 30 of them test features deprecated last quarter. Quarterly suite audits — a half-day to scan for dead tests — keep the count meaningful.
  • Treating cy.wait(ms) and unstable selectors as someone else's problem. Both are caught by code review when reviewers know to look for them. ESLint rules (cypress/no-unnecessary-waiting, custom rules for class-based selectors) catch the rest before merge. The rules cost an hour to set up and prevent thousands of flaky-test debugging hours.

🎯 Practice task

Audit a real suite and apply two improvements. 35-45 minutes.

  1. Open Cypress Cloud (or run cypress run --reporter mochawesome and inspect the JSON) for a suite of yours. Record three numbers: total tests, pass rate, average duration.
  2. Identify three flaky tests. Either pull from the Cloud's flake list or run the suite five times locally and note tests that pass-then-fail. For each, write down the suspected root cause (cy.wait(ms), unstable selector, race condition).
  3. Fix one flaky test fully. Replace the flake-cause pattern with the right Cypress idiom (cy.intercept + cy.wait("@alias"), data-testid selector, should assertion before dependent step). Run the spec ten times; confirm it passes every time.
  4. Quarantine the other two. Move them to cypress/e2e/quarantine/. Add excludeSpecPattern: "cypress/e2e/quarantine/**" to cypress.config.ts. Confirm npm run cy:run skips them.
  5. Set up tiered workflows. Create .github/workflows/smoke.yml and .github/workflows/regression.yml from the lesson example. Confirm the smoke job runs on every push and is fast (< 3 min); regression runs on PR and nightly.
  6. Delete one dead test. Find a test for a feature that's no longer in the product. Delete the spec; commit with a clear message ("remove tests for deprecated invite-code flow").
  7. Stretch: add ESLint with eslint-plugin-cypress to the project. Configure rules: cypress/no-unnecessary-waiting: error, cypress/no-assigning-return-values: error. Run the linter; fix one or two violations it surfaces. The lint is now your safety net for every future PR.

That ends chapter 9 — and the framework half of the course. You have folder structure, shared utilities, factories, and the maintenance habits that keep all of it healthy as the project grows. The capstone in chapter 10 puts every chapter together: a full e-commerce test suite from a single brief, with the framework you've spent nine chapters building.

// tip to track lessons you complete and pick up where you left off across devices.