Snapshot mode is the default in Playwright MCP. Instead of sending Claude pixels, the server sends a structured accessibility tree of the page — a compact, role-and-name view of every element the user (or a screen reader) could interact with. This is the mode you'll spend nearly all your time in. This lesson explains what the tree contains, why it's a better fit than screenshots for most tasks, how the [ref=eN] references actually work, and where the mode runs out of road.
The big mental model: the assistant sees the page the way an accessibility engine sees it. If your app is well-built for screen readers, snapshot mode is borderline magical. If it's not, the gaps in the snapshot will tell you exactly what's missing — and that gap matters for real users too.
What a snapshot looks like
A trimmed example from a typical landing page:
- banner:
- link "Home" [ref=e1]
- link "Products" [ref=e2]
- button "Sign in" [ref=e3]
- main:
- heading "Welcome to MyApp" [level=1]
- paragraph: "Get started by creating an account."
- button "Create account" [ref=e4]
- contentinfo:
- link "Privacy" [ref=e5]
- link "Terms" [ref=e6]Each entry has a role (link, button, heading, paragraph), an accessible name (the user-visible label), and — for interactive elements — a unique ref the assistant uses when calling tools.
Why this beats screenshots for most flows
- Compact. A snapshot is text — typically a few KB. A full-page screenshot can be hundreds of KB and consumes far more model context (image tokens are expensive).
- Structured. The model parses roles and hierarchies directly. No OCR, no coordinate guessing, no "is that a button or a div?" ambiguity.
- Deterministic. Same DOM produces the same snapshot. Identical runs see identical input — far more reliable than pixel-level reasoning, which is sensitive to anti-aliasing and font rendering.
- Aligned with assistive tech. The tree is generated from the same accessibility APIs screen readers use. If your app supports keyboard and screen-reader users, the snapshot is rich. If it doesn't, you have an accessibility bug and a bad AI experience — the same fix solves both.
How [ref=eN] actually works
The ref is a server-side handle, not a CSS selector. When the snapshot is generated, Playwright MCP assigns each interactive element a stable id (e1, e2, …) and remembers the underlying DOM node. The model passes that id back when it wants to act:
Claude calls: browser_click({ "ref": "e3" })
Server resolves: e3 → the "Sign in" button → real Playwright clickA few properties worth internalising:
- Refs are scoped to the latest snapshot. A new snapshot — after a navigation, a click that opens a modal, an async re-render — invalidates the old refs. The assistant's job is to re-snapshot before targeting elements; the server's job is to refuse a stale ref.
- The ref doesn't leak DOM details. The model doesn't see classes, ids, or framework-generated attributes. It only sees role and name. That's a feature: it forces reasoning at the level of what the user sees, not what the markup happens to be today.
- Refs survive intra-page actions. Hovering, focusing, scrolling — none of these invalidate refs. Only state changes that re-issue a snapshot do.
This is why generated Playwright code from MCP sessions tends to use getByRole / getByLabel / getByText — those locators are the closest deterministic equivalent to how the model already sees the page.
What the tree includes and excludes
Includes:
- Interactive controls — links, buttons, inputs, selects, checkboxes, tabs, dialogs.
- Headings and landmarks —
banner,main,navigation,contentinfo. Useful for the assistant to orient itself. - Form fields with their associated labels (when the markup wires them up correctly).
- Live regions, so dynamic updates (toasts, validation messages) surface naturally.
Excludes:
- Visual styling — colours, sizes, spacing. "Is the CTA above the fold?" is unanswerable from a snapshot.
- Image content beyond
alttext. A page full of un-described illustrations is largely invisible. - Hidden elements —
display: none,visibility: hidden, oraria-hidden="true". That's usually the right call, but watch for things hidden behind animations. - Custom canvas / WebGL content. Charts, design tools, maps — these need vision mode (next lesson).
Snapshot mode at a glance
- – Roles: link, button, heading, etc.
- – Accessible names (visible labels)
- – Hierarchy and landmarks
- – Live region updates
- – Visual styling and layout
- – Image pixels (only alt text)
- – Hidden / aria-hidden elements
- – Canvas and WebGL content
- – Reads role + name to find targets
- – Calls tools with [ref=eN]
- – Re-snapshots after every state change
- Form filling and navigation –
- Reading text content –
- Accessibility audits –
Snapshot mode as a free accessibility audit
A side benefit worth flagging: a snapshot is essentially what an assistive-tech user sees. If the assistant struggles to identify a button — "I see an unnamed button at ref=e7" — the same button is unidentifiable to a screen reader. If a form input has no associated label, neither the model nor a real user can confidently tell what it's for. Running a Playwright MCP session is, incidentally, one of the cheapest accessibility smoke tests on the market.
This is one of the genuinely new capabilities AI-augmented testing brings: not because the AI is doing accessibility analysis on purpose, but because its mode of perception forces accessibility issues to the surface.
When snapshot mode struggles
- Visual layout questions. "Is the CTA centred?" / "Does the hero image fit above the fold?" — neither is answerable from a tree. Switch to vision mode (next lesson).
- Image-driven content. A product gallery without alt text is an opaque blob. Either improve the alt text (the right fix) or use vision mode for this specific session.
- Custom canvases. Anything rendered into
<canvas>— charts, maps, design surfaces — is invisible. Vision mode is mandatory. - Heavily animated UIs. Elements appearing mid-animation can be missing from one snapshot and present in the next. Add an explicit wait before the assistant re-snapshots.
The fix in every case is the same: take a screenshot for that specific step, then return to snapshot mode for the rest of the flow. Hybrid usage is normal and expected.
⚠️ Common mistakes
- Reusing a stale
refafter a state change. The model occasionally tries to click areffrom a snapshot taken before a modal opened. The server will refuse, the assistant will re-snapshot and recover — but you can save the round trip by prompting "after each navigation, re-snapshot before clicking" in long flows. The same hygiene applies to generated test code: never reuse a locator handle across re-renders. - Treating a missing element as an AI failure. "It can't find the button" is sometimes the model; far more often it's a real accessibility gap. Open the snapshot panel and check whether the element appears at all. If it doesn't, the bug is in the page, not the prompt.
- Forcing snapshot mode where vision is needed. Asking "is the hero image visually correct?" in pure snapshot mode produces confident-sounding but unsupported answers. The model can only assert what's in the tree. Switch modes deliberately rather than letting it improvise.
🎯 Practice task
Inspect a real snapshot end to end. 20 minutes.
- In a fresh Claude Desktop session, prompt: "Navigate to https://demo.playwright.dev/todomvc and show me the page's accessibility snapshot." The assistant calls
browser_snapshotand pastes the result. - Read the snapshot carefully. Identify the input field, the toggle-all checkbox, and the footer filters by their roles and names. Note the
refids. - Compare the snapshot against your own browser's DevTools — open Chrome's Accessibility panel on the same page and walk the tree. Notice that what Playwright MCP produces is essentially the same tree, condensed.
- Pick one of your team's real apps and run "Take an accessibility snapshot of [URL] and list any interactive elements that are missing a clear accessible name." The output is a free first-pass accessibility audit. File any genuine findings as bugs.
- Stretch: prompt "Compare the snapshots before and after I click 'Add'." Watch the model re-snapshot and diff. This is the workflow you'll lean on for state-change verification across the rest of the course.
Snapshot mode is the workhorse — every later lesson assumes you understand it. The next lesson covers vision mode, which is the right tool for the small set of cases where the tree falls short.