Sharding for Large Test Suites

8 min read

The previous lesson scaled parallelism on one machine. The next ceiling is when one machine isn't enough — a 1,000-test suite at 8 workers still takes 6 minutes, and adding workers stops helping once you saturate the CPU or hit external rate limits. Sharding is the next layer: split the suite into N shards, run each on its own CI machine in parallel, merge the results. A 6-minute suite becomes a 1.5-minute one when sharded across 4 machines. This lesson is the --shard flag, the GitHub Actions matrix that orchestrates it, the blob reporter that survives the split, and the cost trade-off you should think about before scaling shards into double digits.

How sharding works

--shard=N/M tells Playwright "pretend the test list is divided into M equal slices; run the Nth slice." Run on machine 1 with --shard=1/4; run on machine 2 with --shard=2/4; and so on. Each machine runs ~25% of the tests.

# Machine 1
npx playwright test --shard=1/4
 
# Machine 2
npx playwright test --shard=2/4
 
# Machine 3
npx playwright test --shard=3/4
 
# Machine 4
npx playwright test --shard=4/4

Playwright distributes test files (not individual tests) across shards using a stable hash, so the same file always lands on the same shard. That stability matters for caching and for diagnosing "this shard always fails" patterns.

Within a shard, the usual parallelism applies — workers run within the shard's slice in parallel. Sharding × workers gives you a 2D parallelism matrix: 4 shards × 2 workers per shard = 8-way parallelism across the suite.

CI matrix — GitHub Actions

The canonical multi-machine setup, as a matrix workflow:

name: Playwright Tests
on: [push, pull_request]
 
jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4]
        shardTotal: [4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: lts/* }
 
      - run: npm ci
      - run: npx playwright install --with-deps
 
      - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: blob-report-${{ matrix.shardIndex }}
          path: blob-report
          retention-days: 1

GitHub spawns 4 jobs in parallel; each runs one shard. fail-fast: false is critical — without it, GitHub kills the others the moment one shard fails, and you lose visibility into whether the failure was localised or systemic.

The same shape works for GitLab CI, CircleCI, Buildkite, Jenkins — every modern runner supports matrix-style parallelism. The flag stays --shard=N/M; only the orchestration syntax differs.

Merging reports — the blob reporter

A run sharded across 4 machines produces 4 separate reports. By default, each shard's HTML report is independent — useful per-shard, but you want a unified view. Playwright ships a blob reporter that produces an intermediate format designed for merging:

// playwright.config.ts
import { defineConfig } from "@playwright/test";
 
export default defineConfig({
  reporter: process.env.CI ? [["blob"]] : [["html"]]
});

In CI, every shard writes a blob-report/ folder. Upload each as an artefact (the workflow above does this). After all shards finish, a separate job downloads them and merges:

merge-reports:
  needs: test
  if: always()
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with: { node-version: lts/* }
    - run: npm ci
 
    - uses: actions/download-artifact@v4
      with:
        path: all-blob-reports
        pattern: blob-report-*
        merge-multiple: true
 
    - run: npx playwright merge-reports --reporter html ./all-blob-reports
 
    - uses: actions/upload-artifact@v4
      with:
        name: html-report--final
        path: playwright-report
        retention-days: 14

merge-reports reads every blob, recombines them into one HTML report, and produces a unified view — same as if the whole suite had run on one machine. This is what you download and open when triaging CI failures.

The sharding lifecycle

Step 1 of 6

Suite has N tests

Total wall-clock at 1 machine, 4 workers: ~6 minutes for a 1000-test suite at 1.5s/test

How many shards is the right number?

The naive answer is "more is faster." The actual answer accounts for fixed overhead — every shard pays:

  • Ubuntu runner spin-up (~30s)
  • npm ci install (~30-60s)
  • playwright install --with-deps (~30-60s, with caching this drops to ~10s)
  • Browser startup per worker

For a 6-minute suite, 4 shards saves 4.5 minutes. The setup overhead is ~1.5 minutes per shard. Net savings: roughly 3 minutes — a clear win.

For a 90-second suite, 4 shards adds 4 × 1.5 minutes of parallel overhead but only saves ~70 seconds of test time. You might end up slower than a single fast machine. The break-even is roughly 5 minutes — below that, sharding isn't worth it.

A practical rule:

  • Suite under 3 minutes → don't shard. One machine, lots of workers.
  • 3-15 minutes → 2-4 shards is the sweet spot.
  • Over 15 minutes → 4-8 shards. Beyond that, the cost (CI minutes × runners) outpaces the wall-clock win for most teams.

Cost matters

Each shard is a CI runner. GitHub Actions on a public repo is free; on private repos it's ~$0.008 per minute. A 4-shard setup running on every PR is 4 × runtime in CI minutes. Multiply by PR volume (say 200 PRs/month, each averaging 2 CI runs) and the bill adds up.

For most teams, the trade-off favours speed (a developer waiting 6 minutes on PR feedback is more expensive than 4 minutes of CI compute). For high-PR-volume orgs, look at the actual numbers before scaling shards into double digits.

Workers + sharding combined

Sharding doesn't replace workers — they multiply. A 4-shard CI setup with 2 workers per shard gives 8-way parallelism overall:

- run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
  env:
    PW_WORKERS: 2  # honoured by the config below
// playwright.config.ts
workers: process.env.CI ? parseInt(process.env.PW_WORKERS || "1") : undefined

Tune the worker count per shard based on the runner size. Standard GitHub-hosted Linux runners are 2 vCPU × 7GB — 2 workers per shard is a reasonable baseline. Larger self-hosted runners can run more.

When sharding doesn't work — flaky distribution

Playwright's shard distribution is by file hash, which is stable but not always balanced. If one file has 200 tests and the rest have 5, shard 1 (which gets the big file) runs much longer than shards 2-4. The total wall-clock is bottlenecked by shard 1.

The fix isn't framework-level — it's discipline:

  • Keep test files small (under ~20 tests each). One feature → one file → small enough to be a balanced shard candidate.
  • For files with naturally many cases (parameterised data-driven tests), split them into multiple files by category.
  • If one file is legitimately huge and unsplittable, accept that shard's runtime as the bottleneck or use test.describe.configure({ mode: 'parallel' }) aggressively within it.

Coming from Cypress?

The mappings:

  • Cypress's parallelisation requires Cypress Cloud (paid) → Playwright sharding is built-in and free.
  • cypress run --record --parallel --key X (with Cloud) → npx playwright test --shard=N/M (with no service).
  • Cypress's load balancing requires the Cloud orchestrator → Playwright shards by file hash, simpler but less dynamic.

The cost difference is significant for larger teams. A 50-developer team running Cypress Cloud at parallel-record tier can spend $5,000+/year on the orchestrator alone; Playwright sharding is free, sits in your existing GitHub Actions config, and merges via a built-in tool.

⚠️ Common mistakes

  • Forgetting fail-fast: false. With it omitted (or set to true), GitHub kills sibling shards the moment one fails. You lose visibility into whether the failure is one bad test or a systemic issue. Always fail-fast: false for matrix sharding.
  • Skipping the merge step and reading 4 separate reports. The blob reporter exists specifically to enable merge. Without it, you have 4 disconnected HTML reports — debugging a shard-spanning regression means clicking through 4 tabs. Wire up the merge job from day one.
  • Sharding a 30-second suite. The fixed overhead per shard (~60-90s of setup) dwarfs the test time. You end up slower than running serially on one machine. Profile your actual runtime before adding shards.

🎯 Practice task

Build a 4-shard GitHub Actions workflow with merge. 30-40 minutes.

  1. Update playwright.config.ts to use the blob reporter on CI:

    import { defineConfig } from "@playwright/test";
     
    export default defineConfig({
      testDir: "./tests",
      fullyParallel: true,
      workers: process.env.CI ? 2 : undefined,
      reporter: process.env.CI ? [["blob"]] : [["html"]]
    });
  2. Create .github/workflows/playwright.yml:

    name: Playwright Tests
    on: [push, pull_request]
     
    jobs:
      test:
        timeout-minutes: 30
        runs-on: ubuntu-latest
        strategy:
          fail-fast: false
          matrix:
            shardIndex: [1, 2, 3, 4]
            shardTotal: [4]
        steps:
          - uses: actions/checkout@v4
          - uses: actions/setup-node@v4
            with: { node-version: lts/* }
          - run: npm ci
          - run: npx playwright install --with-deps
          - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
          - uses: actions/upload-artifact@v4
            if: always()
            with:
              name: blob-report-${{ matrix.shardIndex }}
              path: blob-report
              retention-days: 1
     
      merge-reports:
        needs: test
        if: always()
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - uses: actions/setup-node@v4
            with: { node-version: lts/* }
          - run: npm ci
          - uses: actions/download-artifact@v4
            with:
              path: all-blob-reports
              pattern: blob-report-*
              merge-multiple: true
          - run: npx playwright merge-reports --reporter html ./all-blob-reports
          - uses: actions/upload-artifact@v4
            with:
              name: html-report--final
              path: playwright-report
              retention-days: 14
  3. Commit and push. Watch GitHub Actions: 4 shard jobs run in parallel; once all 4 finish, the merge-reports job runs and produces the unified HTML.

  4. Download the html-report--final artefact. Unzip and open. The report looks identical to a single-machine run but covers the whole suite.

  5. Try it locally. npx playwright test --shard=1/4 runs only the first shard's tests. --shard=2/4 runs the second. Confirm the test counts add up to the total when you run all four.

  6. Force a shard imbalance. Create a single test file with 50 tests and one file with 1 test. Run sharded — the shard that gets the 50-test file dominates the runtime. The fix is splitting the big file; this is the muscle for "shard time is bottlenecked by the slowest shard."

  7. Stretch: add a Slack notification step that posts the merged report URL to a channel on failure. This is the pattern for "CI tells the team about real failures without needing them to open GitHub."

You can now scale beyond a single machine without paying for orchestration services. The next lesson goes one step deeper into the CI side — the full GitHub Actions setup, including the patterns for caching browser binaries, running against deployed environments, and starting a local server inside CI.

// tip to track lessons you complete and pick up where you left off across devices.