Accessibility Testing with @axe-core/playwright

A page with bad colour contrast looks fine to a sighted designer and is unreadable to a person with low vision. A button with no accessible name passes every functional test and is invisible to a screen reader. An image with no alt attribute is described to assistive tech as "image" — useless. Accessibility (a11y) bugs are functional bugs for the 15% of users who rely on assistive technology, and they're invisible to the kind of UI test that just checks "the click works." axe-core is the open-source rule engine that catches them; @axe-core/playwright integrates it into your test suite. This lesson is the setup, the patterns for scoping and filtering, and the trade-offs around what to gate CI on.

What axe-core does

axe-core is a JavaScript library that scans a page against a configurable set of accessibility rules — the rules cover most of WCAG 2.1 AA, plus best-practice extensions. It runs inside the page, queries the DOM and the accessibility tree, and reports violations with:

The rule that failed (color-contrast, image-alt, aria-required-attr, etc.)
The impact level (minor, moderate, serious, critical)
The specific elements that violated it
A link to remediation guidance

It's the same engine that powers Deque's commercial axe products, the Chrome DevTools accessibility panel, and integrations across React, Vue, Storybook, Jest, Cypress, and now Playwright. If you've used axe in any of those, the rules and output will look familiar.

For the Manual Software Testing course's accessibility chapter on the why, the WCAG levels, and what the rules mean — that's the conceptual foundation. This lesson is how to automate that scanning inside Playwright.

Installation

npm install --save-dev @axe-core/playwright

That's it. The package ships TypeScript types; no separate @types/... needed.

import { test, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
 
test("homepage has no a11y violations", async ({ page }) => {
  await page.goto("/");
 
  const results = await new AxeBuilder({ page }).analyze();
 
  expect(results.violations).toEqual([]);
});

new AxeBuilder({ page }) creates a scanner bound to the current page. .analyze() injects axe-core, runs the scan, returns a structured result. results.violations is an array — empty means a clean scan, populated means there are issues to fix. The toEqual([]) assertion fails the test if any violations exist.

In a fresh app, this test almost always fails on the first run with 5-20 violations. That's expected — most apps have accumulated a11y debt. The point is to start measuring and stop the bleeding from new violations.

Scoping — `.include()` and `.exclude()`

Real apps have third-party widgets you don't control (chat bubbles, marketing pixels, payment iframes). A11y violations inside them aren't actionable for your team. Scope the scan:

// Only scan the main content area
const results = await new AxeBuilder({ page })
  .include("[data-testid='main-content']")
  .analyze();
 
// Scan everything except the third-party chat widget
const results = await new AxeBuilder({ page })
  .exclude("#intercom-container")
  .exclude(".cookieyes-banner")
  .analyze();
 
// Multiple scopes
const results = await new AxeBuilder({ page })
  .include("main")
  .include("nav")
  .exclude(".third-party-ad")
  .analyze();

Use .include() to scan a specific area; use .exclude() to remove noise. The two combine — include the page, exclude the widgets you can't fix.

Filtering by impact

Not every a11y rule is equally critical. The four impact levels:

Critical — blocks a screen-reader user entirely (no accessible name on a button, no lang attribute on <html>).
Serious — significant barrier (poor colour contrast, missing form labels).
Moderate — noticeable issue (heading order, redundant ARIA).
Minor — best-practice deviation (missing <nav> landmark on a page that has navigation).

A reasonable starter gate: fail the build on critical and serious; report moderate and minor without failing.

test("a11y — critical and serious only", async ({ page }) => {
  await page.goto("/");
  const results = await new AxeBuilder({ page }).analyze();
 
  const critical = results.violations.filter(
    v => v.impact === "critical" || v.impact === "serious"
  );
 
  expect(critical).toEqual([]);
});

As the team's a11y maturity improves, ratchet up — start including moderate, then minor. The same test pattern survives the bar moving.

Selecting WCAG levels

axe ships rules tagged by WCAG version and conformance level. Restrict the scan to the standard your project commits to:

const results = await new AxeBuilder({ page })
  .withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"])
  .analyze();

That's the canonical "WCAG 2.1 AA" tag combination — A and AA from both WCAG 2.0 and 2.1. Add 'wcag22aa' once axe ships those rules; add 'best-practice' for the broader set.

Tag filtering also works the other way — disable specific rules for known-issue exceptions:

const results = await new AxeBuilder({ page })
  .disableRules(["color-contrast"]) // exception while design system is migrating
  .analyze();

The right discipline: track every disabled rule with a ticket number and an owner. Disabled rules without a ticket rot into permanent "we don't test this anymore."

A reusable AxeBuilder fixture

Most a11y tests scan the same way — same tags, same exclusions. Wrap it in a fixture:

// fixtures/index.ts
import { test as base, type Page } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
 
type AxeBuilderFixture = () => AxeBuilder;
 
export const test = base.extend<{ makeAxeBuilder: AxeBuilderFixture }>({
  makeAxeBuilder: async ({ page }, use) => {
    const make = () =>
      new AxeBuilder({ page })
        .withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"])
        .exclude("#intercom-container")
        .exclude(".cookieyes-banner");
    await use(make);
  }
});
 
export { expect } from "@playwright/test";

Now every spec scans with consistent settings:

import { test, expect } from "../fixtures";
 
test("checkout has no critical/serious violations", async ({ page, makeAxeBuilder }) => {
  await page.goto("/checkout");
  const results = await makeAxeBuilder().analyze();
 
  const critical = results.violations.filter(
    v => v.impact === "critical" || v.impact === "serious"
  );
  expect(critical).toEqual([]);
});

The fixture removes 5+ lines of boilerplate from every test. Adding a new exclusion is a one-line change in the fixture; every spec inherits.

Impact distribution — what real teams see

Typical axe violation distribution on a mid-stage SaaS app

Critical (blocks screen-reader users)3

Serious (significant barrier)12

Moderate (noticeable issue)18

Minor (best-practice)9

The numbers are illustrative. The shape isn't — most apps have far more moderate and minor issues than critical. The right strategy is usually: gate CI on critical + serious; report (don't fail) on moderate + minor with a tracked backlog.

A typed spec that scans five key pages, with shared fixture and graduated gates:

import { test, expect } from "../fixtures";
 
test.describe("Accessibility audit", () => {
  const pages = [
    { path: "/", name: "homepage" },
    { path: "/login", name: "login" },
    { path: "/products", name: "product-listing" },
    { path: "/checkout", name: "checkout" },
    { path: "/account/settings", name: "account-settings" }
  ];
 
  for (const { path, name } of pages) {
    test(`${name} has no critical or serious violations`, async ({ page, makeAxeBuilder }) => {
      await page.goto(path);
      const results = await makeAxeBuilder().analyze();
 
      const blocking = results.violations.filter(
        v => v.impact === "critical" || v.impact === "serious"
      );
 
      // Helpful failure message — print the violations if there are any
      if (blocking.length > 0) {
        console.error(
          `${blocking.length} blocking a11y violations on ${name}:\n`,
          blocking.map(v => `  - [${v.impact}] ${v.id}: ${v.description}`).join("\n")
        );
      }
 
      expect(blocking).toEqual([]);
    });
  }
 
  test("logged-in dashboard has no critical violations", async ({ page, makeAxeBuilder }) => {
    // (assumes storage state is configured; chapter 6's auth lesson)
    await page.goto("/dashboard");
    const results = await makeAxeBuilder()
      .include("[data-testid='dashboard-content']")
      .analyze();
 
    const critical = results.violations.filter(v => v.impact === "critical");
    expect(critical).toEqual([]);
  });
});

Five public-page scans + one authenticated-page scan. Each one fails fast with a console-printed list of violations so the developer fixing it knows exactly what's broken without opening the HTML report.

Coming from Cypress?

The mappings:

cypress-axe's cy.injectAxe() + cy.checkA11y() → @axe-core/playwright's new AxeBuilder({ page }).analyze().
cy.checkA11y(null, { includedImpacts: ['serious', 'critical'] }) → filter results.violations by impact after .analyze().

The Playwright API is more verbose for simple cases (you write the filter yourself instead of passing options) but more flexible for real-world cases (you have full programmatic access to the result, can attach it to reports, can compose it with other test logic). The migration is mechanical; the gain is everything you can do with the result after the scan.

⚠️ Common mistakes

Asserting expect(violations).toEqual([]) on a brand-new app and being surprised by 30 failures. A11y debt is real. Either scan a smaller scope (include('[data-testid="main"]')) for the first pass, or filter by impact (critical only) until the team catches up. Then ratchet up over time.
Disabling color-contrast because the design system is "almost done." Six months later, the rule is still disabled and the design system shipped without colour-contrast checks. Disable rules only with a tracking ticket and an explicit owner; review the disabled-rules list every quarter.
Running the scan once at the test start and missing dynamically-loaded content. axe scans the DOM at the moment you call .analyze(). If your app lazy-loads the main content after first paint, a scan immediately after goto finds an empty page. Always interact (or wait) until the page reaches the state under test, then scan.

🎯 Practice task

Add a11y testing to the Sauce Demo suite. 25-30 minutes.

Install: npm install --save-dev @axe-core/playwright.

Create tests/a11y.spec.ts:

import { test, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
 
test.describe("Accessibility — Sauce Demo", () => {
  test("login page", async ({ page }) => {
    await page.goto("https://www.saucedemo.com");
 
    const results = await new AxeBuilder({ page })
      .withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"])
      .analyze();
 
    const blocking = results.violations.filter(
      v => v.impact === "critical" || v.impact === "serious"
    );
 
    if (blocking.length > 0) {
      console.error(
        `${blocking.length} blocking violations:`,
        blocking.map(v => `[${v.impact}] ${v.id}`).join(", ")
      );
    }
 
    expect(blocking).toEqual([]);
  });
 
  test("inventory page after login", async ({ page }) => {
    await page.goto("https://www.saucedemo.com");
    await page.getByPlaceholder("Username").fill("standard_user");
    await page.getByPlaceholder("Password").fill("secret_sauce");
    await page.getByRole("button", { name: "Login" }).click();
    await expect(page).toHaveURL(/inventory/);
 
    const results = await new AxeBuilder({ page })
      .withTags(["wcag2a", "wcag2aa"])
      .analyze();
 
    expect(results.violations.filter(v => v.impact === "critical")).toEqual([]);
  });
 
  test("inventory scoped to main content only", async ({ page }) => {
    await page.goto("https://www.saucedemo.com");
    await page.getByPlaceholder("Username").fill("standard_user");
    await page.getByPlaceholder("Password").fill("secret_sauce");
    await page.getByRole("button", { name: "Login" }).click();
 
    const results = await new AxeBuilder({ page })
      .include("#inventory_container")
      .analyze();
 
    console.log(`Total violations in main content: ${results.violations.length}`);
    results.violations.forEach(v => {
      console.log(`  [${v.impact}] ${v.id}: ${v.help} (${v.nodes.length} elements)`);
    });
  });
});

Run the spec: npx playwright test a11y.spec.ts. The third test logs all violations to console; the first two assert on critical/serious only.
Inspect a real violation. Sauce Demo has known a11y issues — the problem_user account specifically renders broken images that trigger violations. Add a fourth test that logs in as problem_user and scans #inventory_container. Compare violation counts to standard_user — problem_user should have several image-alt violations.
Build a reusable fixture. Move the AxeBuilder setup into a makeAxeBuilder fixture (per the lesson). Refactor the three tests to use it. The boilerplate disappears; the test bodies become 3 lines each.
Stretch: add a sixth test that scans the Sauce Demo cart page after adding two items. Filter to serious-impact violations and print each one. Open the axe rules documentation and look up two of the rule IDs you saw — read what the rule checks, why it matters, and how to fix it. This is the muscle for a11y testing being genuinely useful: the test fails, you read the rule, you understand the bug, you fix the page.

You now have automated a11y testing wired into the same test runner as your functional and visual tests. The next lesson takes the raw axe results and shapes them into reports your dev team will actually read.