The Snapshot Mode — Accessibility Tree Navigation

Snapshot mode is the default in Playwright MCP. Instead of sending Claude pixels, the server sends a structured accessibility tree of the page — a compact, role-and-name view of every element the user (or a screen reader) could interact with. This is the mode you'll spend nearly all your time in. This lesson explains what the tree contains, why it's a better fit than screenshots for most tasks, how the [ref=eN] references actually work, and where the mode runs out of road.

The big mental model: the assistant sees the page the way an accessibility engine sees it. If your app is well-built for screen readers, snapshot mode is borderline magical. If it's not, the gaps in the snapshot will tell you exactly what's missing — and that gap matters for real users too.

What a snapshot looks like

A trimmed example from a typical landing page:

- banner:
    - link "Home" [ref=e1]
    - link "Products" [ref=e2]
    - button "Sign in" [ref=e3]
- main:
    - heading "Welcome to MyApp" [level=1]
    - paragraph: "Get started by creating an account."
    - button "Create account" [ref=e4]
- contentinfo:
    - link "Privacy" [ref=e5]
    - link "Terms" [ref=e6]

Each entry has a role (link, button, heading, paragraph), an accessible name (the user-visible label), and — for interactive elements — a unique ref the assistant uses when calling tools.

Why this beats screenshots for most flows

Compact. A snapshot is text — typically a few KB. A full-page screenshot can be hundreds of KB and consumes far more model context (image tokens are expensive).
Structured. The model parses roles and hierarchies directly. No OCR, no coordinate guessing, no "is that a button or a div?" ambiguity.
Deterministic. Same DOM produces the same snapshot. Identical runs see identical input — far more reliable than pixel-level reasoning, which is sensitive to anti-aliasing and font rendering.
Aligned with assistive tech. The tree is generated from the same accessibility APIs screen readers use. If your app supports keyboard and screen-reader users, the snapshot is rich. If it doesn't, you have an accessibility bug and a bad AI experience — the same fix solves both.

How `[ref=eN]` actually works

The ref is a server-side handle, not a CSS selector. When the snapshot is generated, Playwright MCP assigns each interactive element a stable id (e1, e2, …) and remembers the underlying DOM node. The model passes that id back when it wants to act:

Claude calls: browser_click({ "ref": "e3" })
Server resolves: e3 → the "Sign in" button → real Playwright click

A few properties worth internalising:

Refs are scoped to the latest snapshot. A new snapshot — after a navigation, a click that opens a modal, an async re-render — invalidates the old refs. The assistant's job is to re-snapshot before targeting elements; the server's job is to refuse a stale ref.
The ref doesn't leak DOM details. The model doesn't see classes, ids, or framework-generated attributes. It only sees role and name. That's a feature: it forces reasoning at the level of what the user sees, not what the markup happens to be today.
Refs survive intra-page actions. Hovering, focusing, scrolling — none of these invalidate refs. Only state changes that re-issue a snapshot do.

This is why generated Playwright code from MCP sessions tends to use getByRole / getByLabel / getByText — those locators are the closest deterministic equivalent to how the model already sees the page.

What the tree includes and excludes

Includes:

Interactive controls — links, buttons, inputs, selects, checkboxes, tabs, dialogs.
Headings and landmarks — banner, main, navigation, contentinfo. Useful for the assistant to orient itself.
Form fields with their associated labels (when the markup wires them up correctly).
Live regions, so dynamic updates (toasts, validation messages) surface naturally.

Excludes:

Visual styling — colours, sizes, spacing. "Is the CTA above the fold?" is unanswerable from a snapshot.
Image content beyond alt text. A page full of un-described illustrations is largely invisible.
Hidden elements — display: none, visibility: hidden, or aria-hidden="true". That's usually the right call, but watch for things hidden behind animations.
Custom canvas / WebGL content. Charts, design tools, maps — these need vision mode (next lesson).

Snapshot mode at a glance

Snapshot Mode

– Roles: link, button, heading, etc.
– Accessible names (visible labels)
– Hierarchy and landmarks
– Live region updates

– Visual styling and layout
– Image pixels (only alt text)
– Hidden / aria-hidden elements
– Canvas and WebGL content

– Reads role + name to find targets
– Calls tools with [ref=eN]
– Re-snapshots after every state change

Form filling and navigation –
Reading text content –
Accessibility audits –

Snapshot mode as a free accessibility audit

A side benefit worth flagging: a snapshot is essentially what an assistive-tech user sees. If the assistant struggles to identify a button — "I see an unnamed button at ref=e7" — the same button is unidentifiable to a screen reader. If a form input has no associated label, neither the model nor a real user can confidently tell what it's for. Running a Playwright MCP session is, incidentally, one of the cheapest accessibility smoke tests on the market.

This is one of the genuinely new capabilities AI-augmented testing brings: not because the AI is doing accessibility analysis on purpose, but because its mode of perception forces accessibility issues to the surface.

When snapshot mode struggles

Visual layout questions. "Is the CTA centred?" / "Does the hero image fit above the fold?" — neither is answerable from a tree. Switch to vision mode (next lesson).
Image-driven content. A product gallery without alt text is an opaque blob. Either improve the alt text (the right fix) or use vision mode for this specific session.
Custom canvases. Anything rendered into <canvas> — charts, maps, design surfaces — is invisible. Vision mode is mandatory.
Heavily animated UIs. Elements appearing mid-animation can be missing from one snapshot and present in the next. Add an explicit wait before the assistant re-snapshots.

The fix in every case is the same: take a screenshot for that specific step, then return to snapshot mode for the rest of the flow. Hybrid usage is normal and expected.

⚠️ Common mistakes

Reusing a stale ref after a state change. The model occasionally tries to click a ref from a snapshot taken before a modal opened. The server will refuse, the assistant will re-snapshot and recover — but you can save the round trip by prompting "after each navigation, re-snapshot before clicking" in long flows. The same hygiene applies to generated test code: never reuse a locator handle across re-renders.
Treating a missing element as an AI failure. "It can't find the button" is sometimes the model; far more often it's a real accessibility gap. Open the snapshot panel and check whether the element appears at all. If it doesn't, the bug is in the page, not the prompt.
Forcing snapshot mode where vision is needed. Asking "is the hero image visually correct?" in pure snapshot mode produces confident-sounding but unsupported answers. The model can only assert what's in the tree. Switch modes deliberately rather than letting it improvise.

🎯 Practice task

Inspect a real snapshot end to end. 20 minutes.

In a fresh Claude Desktop session, prompt: "Navigate to https://demo.playwright.dev/todomvc and show me the page's accessibility snapshot." The assistant calls browser_snapshot and pastes the result.
Read the snapshot carefully. Identify the input field, the toggle-all checkbox, and the footer filters by their roles and names. Note the ref ids.
Compare the snapshot against your own browser's DevTools — open Chrome's Accessibility panel on the same page and walk the tree. Notice that what Playwright MCP produces is essentially the same tree, condensed.
Pick one of your team's real apps and run "Take an accessibility snapshot of [URL] and list any interactive elements that are missing a clear accessible name." The output is a free first-pass accessibility audit. File any genuine findings as bugs.
Stretch: prompt "Compare the snapshots before and after I click 'Add'." Watch the model re-snapshot and diff. This is the workflow you'll lean on for state-change verification across the rest of the course.

Snapshot mode is the workhorse — every later lesson assumes you understand it. The next lesson covers vision mode, which is the right tool for the small set of cases where the tree falls short.