The single most-reported "weird CI bug" with browser tests: it works locally, it fails on CI, and the diff is one pixel of font anti-aliasing or a missing locale package. The cause is environment drift — your laptop runs macOS with one set of fonts, the CI runner is Ubuntu with a different set. Docker is how serious teams eliminate the variable: bake the OS, the fonts, the browser binaries, the Node version into one image, and run tests inside that image both locally and in CI. Identical bytes everywhere. This lesson is the official Playwright Docker image, the patterns for using it, and the broader CI best-practices checklist that turns "tests are flaky on CI" into "tests just work."
The official Playwright Docker image
Microsoft publishes Docker images that match each Playwright release exactly:
mcr.microsoft.com/playwright:v1.44.0-jammy # Ubuntu 22.04 (Jammy)
mcr.microsoft.com/playwright:v1.44.0-noble # Ubuntu 24.04 (Noble)
Each image contains:
- A pinned Ubuntu LTS
- A pinned Node LTS
- All three browser binaries (Chromium, Firefox, WebKit)
- All system dependencies the browsers need (fonts, codecs, GTK, fontconfig)
- Playwright pre-installed at the matching version
You don't pull latest — you pull a specific version that matches the @playwright/test in your package.json. When you upgrade Playwright, you upgrade the image tag in lockstep. Same version local and remote = same renderer = same screenshots = same test results.
Running tests inside Docker locally
docker run --rm -it \
-v $(pwd):/work \
-w /work \
mcr.microsoft.com/playwright:v1.44.0-jammy \
npx playwright testThree flags:
-v $(pwd):/workmounts your project into the container at/work.-w /worksets the working directory.--rmremoves the container after the test run finishes.
The first run pulls the image (~2 GB; one-time cost). Subsequent runs reuse the cached layer. Inside the container, npx playwright test runs with the pinned environment — same as CI.
Run a specific test:
docker run --rm -it -v $(pwd):/work -w /work \
mcr.microsoft.com/playwright:v1.44.0-jammy \
npx playwright test login.spec.tsUpdating visual baselines inside Docker
The single biggest reason to use Docker locally: regenerating snapshots:
docker run --rm -it -v $(pwd):/work -w /work \
mcr.microsoft.com/playwright:v1.44.0-jammy \
npx playwright test --update-snapshotsThe new baselines render in the Docker environment — exactly the bytes CI will produce. Commit them. CI now passes byte-for-byte because the renderer is identical. No more "passed locally, failed in CI" on visual tests.
A custom Dockerfile for your project
For more control (specific Node version, custom deps, CI-ready image), build your own based on Playwright's:
FROM mcr.microsoft.com/playwright:v1.44.0-jammy
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["npx", "playwright", "test"]Build and run:
docker build -t my-playwright-tests .
docker run --rm -it my-playwright-testsThis pre-installs your project's dependencies into the image. Pushing the image to a registry (GitHub Container Registry, Docker Hub) means CI can docker pull it instead of running npm ci on every run — another speedup.
Docker in GitHub Actions
The cleanest way to run inside Docker on Actions is the container: field:
jobs:
test:
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright testThe whole job runs inside the Playwright Docker image. No npx playwright install --with-deps needed (already in the image). No setup-node (already in the image). Just install your deps and run.
Caveat: container: runs on a Linux host only, so this exact pattern doesn't work for Windows or macOS runners. For pure Linux pipelines (the common case for browser tests), it's the cleanest setup.
Why pin the renderer
A practical example. Without Docker:
- Local dev: macOS, Apple's font rendering, San Francisco UI font.
- CI: Ubuntu Jammy, Liberation fonts, no LCD anti-aliasing.
The same <h1>Welcome</h1> renders ~3 pixels differently per character. Visual snapshot tests pass on your laptop, fail on CI. You spend a week chasing a non-bug.
With Docker:
- Local dev: Playwright Docker image (Ubuntu Jammy, Liberation fonts, identical anti-aliasing).
- CI: Same image.
Same rendering. Same screenshots. Same tests. The variable that bit you is gone.
This isn't only about visual tests — text-overflow detection (getBoundingClientRect), CSS feature support, and even some JavaScript timing depends on the OS. Docker pins all of it.
With Docker vs without
Local-vs-CI environment parity, with and without Docker
Without Docker
Local: your OS, your fonts, your Node version
CI: Ubuntu LTS, default fonts, GitHub-managed Node
Different anti-aliasing, different font fallbacks, different timing
Visual tests pass locally, fail on CI on a 3-pixel diff
With Playwright Docker image
Local: mcr.microsoft.com/playwright:vX.Y.Z
CI: same image, same tag
Identical fonts, identical browsers, identical Node
Visual tests render byte-for-byte the same — same baselines work everywhere
CI best-practices checklist
Beyond Docker, a battle-tested checklist for production-grade Playwright in CI:
- Pin every version. Playwright Docker tag (
v1.44.0, notlatest). Node version (lts/*is acceptable; an exact major is better). System deps (Docker handles this). - Cache aggressively. Playwright browsers between runs.
npm cicache viasetup-node. Ideally a custom Docker image withnpm cipre-baked. - Use
--with-depsonce. First-time setup, browsers + system libs. After that, the image has them. - Set reasonable per-job timeouts.
timeout-minutes: 30on most jobs. Long enough for healthy runs; tight enough to catch infinite loops. - Always upload reports on failure.
if: ${{ !cancelled() }}on the upload step. Failed runs are when you need the report most. - Shard suites over 5 minutes. Below that, sharding's overhead exceeds its win. Above, 4 shards is the sweet spot for most teams.
- Run on PRs, not just merges. Catch regressions before review. Smoke suite on PR (~90s); full suite on merge (~5min).
- Tag and split smoke vs full.
@smoketagged tests run on every commit. Full suite runs on main, scheduled nightly, or manually. - Configure retries explicitly.
retries: process.env.CI ? 2 : 0— 2 retries on CI mask transient infra flake; 0 locally so you see real failures fast. - Trace on first retry.
trace: 'on-first-retry'— saves the trace.zip when a flaky test retries, doesn't bloat artefacts on healthy runs. - Send a notification on main branch failures. Slack, email, GitHub status checks. Failed
mainbuilds need someone's attention; PRs the developer is already watching.
A complete Docker-based workflow
Combining everything from this chapter — Docker, sharding, caching, reporting:
name: Playwright Tests
on: [push, pull_request]
jobs:
test:
timeout-minutes: 30
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
options: --user 1001
strategy:
fail-fast: false
matrix:
shardIndex: [1, 2, 3, 4]
shardTotal: [4]
steps:
- uses: actions/checkout@v4
- name: Cache npm
uses: actions/cache@v4
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
- run: npm ci
- run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
env:
BASE_URL: ${{ secrets.STAGING_URL }}
CI: true
- uses: actions/upload-artifact@v4
if: ${{ !cancelled() }}
with:
name: blob-report-${{ matrix.shardIndex }}
path: blob-report
retention-days: 1
merge-reports:
needs: test
if: always()
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
steps:
- uses: actions/checkout@v4
- run: npm ci
- uses: actions/download-artifact@v4
with:
path: all-blob-reports
pattern: blob-report-*
merge-multiple: true
- run: npx playwright merge-reports --reporter html ./all-blob-reports
- uses: actions/upload-artifact@v4
with:
name: html-report--final
path: playwright-report
retention-days: 14Notice the container: field — every job runs inside Playwright's image. No browser install step; no system-deps step. The setup is invisible because it's pre-baked.
GitLab CI — the same pattern
For GitLab CI users, the same shape:
playwright:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
stage: test
parallel:
matrix:
- SHARD: ["1/4", "2/4", "3/4", "4/4"]
script:
- npm ci
- npx playwright test --shard=$SHARD
artifacts:
when: always
paths:
- blob-report/
expire_in: 1 dayAlmost identical to Actions. The framework-level pattern (Docker image + sharding + report artefacts) carries cleanly across CI providers.
Coming from Cypress?
The mappings:
cypress/includedDocker image →mcr.microsoft.com/playwrightDocker image.- Cypress's parallel-record orchestrator (paid) → Playwright's
--shard=N/M(free). - Cypress's font-rendering inconsistencies (a known long-standing issue) → Docker-pinned renderer eliminates the class.
Teams migrating from Cypress to Playwright often cite "visual tests finally pass on CI" as a top-three benefit. The combination of --update-snapshots + Docker + GitHub Actions Container Jobs is the missing piece that Cypress visual tests almost always struggle with.
⚠️ Common mistakes
- Pulling
:latest. A surprise upgrade breaks every visual test on the day it lands. Pin to a specific tag (v1.44.0-jammy) and bump deliberately when you upgrade Playwright. - Using Docker only on CI but not locally. Now your local snapshots come from macOS rendering and CI runs Linux. Visual tests fail with a 3-pixel diff. Always update snapshots inside the same image you run CI in.
- Mixing
runs-on: ubuntu-latestwithactions/setup-nodeandcontainer: mcr.microsoft.com/playwright. The container already has Node; setting up another wastes time and can produce version drift. Withcontainer:, dropsetup-nodeentirely.
🎯 Practice task
Run your suite inside Docker, both locally and in CI. 30-40 minutes.
-
Find your Playwright version in
package.json(e.g.,"@playwright/test": "1.44.0"). Match it to the Docker tag —mcr.microsoft.com/playwright:v1.44.0-jammy. -
Run locally inside Docker. From your project root:
docker run --rm -it \ -v $(pwd):/work -w /work \ mcr.microsoft.com/playwright:v1.44.0-jammy \ bash -c "npm ci && npx playwright test"First run downloads the image (~2 GB; takes a few minutes). Subsequent runs reuse the cached layers (~10 seconds startup).
-
Update visual snapshots inside Docker. If you have any visual tests:
docker run --rm -it \ -v $(pwd):/work -w /work \ mcr.microsoft.com/playwright:v1.44.0-jammy \ bash -c "npm ci && npx playwright test --update-snapshots"The new baselines come from the Docker renderer. Commit them. CI now produces identical bytes — visual tests pass.
-
Update your GitHub Actions workflow to use the container:
jobs: test: runs-on: ubuntu-latest container: image: mcr.microsoft.com/playwright:v1.44.0-jammy steps: - uses: actions/checkout@v4 - run: npm ci - run: npx playwright testPush and confirm the run still works — usually faster, because the image-baked browser/dep installs are skipped.
-
Force a non-Docker visual diff. Temporarily revert the workflow to
runs-on: ubuntu-latestwithoutcontainer:. Run a visual test you generated baselines for inside Docker. The CI run fails on a 1-2 pixel font-rendering diff. Restore thecontainer:setting; it passes again. This is the muscle for "visual tests need a pinned renderer to be reliable." -
Stretch: build a custom Dockerfile that pre-installs your
npm cidependencies. Push the resulting image to GitHub Container Registry. Update the workflow to use your image instead of the upstream one. Now CI starts withnpm cialready done — another 30 seconds saved per run.
That closes Chapter 8 — parallel execution and CI/CD. You now have:
- Local parallelism via workers (8x speedup on most laptops)
- Across-machine parallelism via sharding (another 4x in typical setups)
- GitHub Actions with caching, artefacts, and merge reporting
- Docker for environment parity that eliminates "works on my machine"
Combined, these turn a 10-minute serial suite into a sub-2-minute CI run that's identical to what runs on every developer's machine. The next chapter — reporting and debugging — covers what to do when those tests do fail: the HTML reporter in depth, custom reporters, the trace viewer, and the patterns for handling flaky tests without giving up parallelism.