You have Playwright MCP installed and the host can list its tools. This lesson is about driving your first real session and understanding what happens behind every prompt. By the end you'll have run two non-trivial flows, watched the AI orchestrate Playwright tools step by step, and built an honest mental model of where this tool fits in your QA day. The point isn't to be impressed — it's to see exactly what is happening so you can debug it later when something doesn't.
Keep the browser visible for these first sessions (skip --headless). Watching the assistant click, type, and re-snapshot in real time is the fastest way to internalise the loop.
A first prompt — and what it really does
Open a fresh chat in Claude Desktop (or a Claude Code session) and paste:
Navigate to https://playwright.dev/, then click "Get started" and tell me what page you land on.A real browser window opens, the homepage loads, the click happens, and the assistant replies with the destination page name. Trivial output, but the round trip behind it is the whole protocol in miniature:
- Claude reads your message and notices it needs browser tools.
- Claude calls
browser_navigatewith{ "url": "https://playwright.dev/" }. The Playwright MCP server launches Chromium and navigates. - Claude calls
browser_snapshot. The server returns the page's accessibility tree — a structured list of headings, links, buttons, each with a uniqueref. - Claude scans the tree and finds an element with role
linkand accessible nameGet started. It callsbrowser_clickwith that element'sref. - The link follows. Claude calls
browser_snapshotagain, reads the new page's title and primary heading, and writes a sentence summarising it.
In Claude Desktop you can expand each tool call inline to see exactly what was sent and what came back. Get into the habit of doing this — it is the single best debugging surface you have. When a session does something unexpected, the answer is almost always in the tool-call payloads, not in the chat narrative.
A more interesting prompt
The single-click flow is fine for a smoke test but doesn't show the model thinking. Try:
Go to https://demo.playwright.dev/todomvc, add three todos: "Buy groceries", "Pay rent", "Call mom".
Mark "Pay rent" as complete. Then tell me how many active todos remain.What you'll see in the tool-call panel:
browser_navigateto the TodoMVC URL.browser_snapshot— the assistant locates the input field by role.- Three rounds of
browser_type(each followed by abrowser_press_keyfor Enter to commit the todo) and abrowser_snapshotbetween rounds so the assistant sees the new list state. browser_clickon the checkbox next to Pay rent, identified by its accessible name in the latest snapshot.- Final
browser_snapshotto read the "2 items left" footer. - A natural-language reply: "Two active todos remain — Buy groceries and Call mom."
No code was written. No selectors were authored. The accessibility tree carried enough structure for the assistant to reason its way through.
What the loop actually looks like
Step 1 of 6
You write a prompt
Plain English describing the goal — start URL, what to do, what to report. The more precise the goal and the success criteria, the better the run.
A reality check before you fall in love
The first session feels magical. Three guardrails to anchor expectations before you start designing workflows around it:
- It is slow. Each tool call adds roughly one to two seconds. A 12-step flow takes 20–30 seconds at minimum, before any model thinking time. Fine for one-off exploration; not fine for a 1,200-test CI suite.
- It costs tokens. Snapshots, screenshots, and tool results all flow back into the model's context. A long session can run several thousand tokens per turn. Budget accordingly, especially with vision mode (chapter 2).
- It is non-deterministic. Same prompt, same app, slightly different paths. Usually the answer is identical; occasionally the model takes an alternate route. Acceptable for exploration, not acceptable for regression tests. The fix is to convert successful sessions into deterministic Playwright code (chapter 3) — which is exactly what this tool is designed to feed into.
Internalise these three constraints and the where it belongs question answers itself. Use it for the work where flexibility outweighs determinism, and use generated code for everything else.
Reading the tool-call panel
Two payloads are worth opening on every session, especially while learning:
- The first
browser_snapshotafter a navigation. You see exactly how the assistant sees the page — every interactive element with its role, name, andref. If a later click fails, this is where you check whether the target was even visible. - Any
browser_clickorbrowser_type. The arguments include theref(which element) and, for typing, the text. Comparing therefagainst the snapshot tells you whether the assistant aimed at the right element. Wrong target failures are far more common than target moved failures.
You'll lean on this view constantly through the rest of the course.
⚠️ Common mistakes
- Vague prompts that omit success criteria. "Test the checkout" gives the assistant nowhere to stop and no way to report meaningfully. "Go to /checkout, add a Standard t-shirt to the cart, complete checkout with the test card 4242…, and report whether the order confirmation page appears within 10 seconds" gives a clear goal, an oracle, and a deadline. Specificity is the dial that controls quality.
- Assuming first-run success means production-ready. A flow that worked once in your hands can fail tomorrow on a different account, with different inventory, on a slower network. Treat the first successful session as a seed — promote it to a deterministic Playwright test (or at least a written charter) before relying on it.
- Ignoring the tool-call panel. The narrative reply is just the result; the tool calls are the evidence. When something feels off — "did it really verify that?" — the answer is in the payloads, not in the prose. New users skip this step and end up trusting summaries that overstate what happened.
🎯 Practice task
Run two real sessions and read the evidence trail. 25 minutes.
- Run the TodoMVC prompt above. Verify the response. Then expand at least three tool calls in the chat and read the actual JSON arguments and results. Note the structure of one
browser_snapshotreply — that accessibility tree is the assistant's sole view of the page in default mode. - Pick one of the prompts you wrote in the previous lesson's practice task (your real bug or your real feature). Paste it into a fresh chat against your staging environment with disposable credentials. Watch the session run end to end with the browser visible.
- Note one thing the assistant did well and one thing it missed or got wrong. Refine the prompt to address the miss — usually by adding a success oracle, a concrete URL, or a specific assertion you want back. Re-run.
- Stretch: ask the assistant at the end of the session to "emit the equivalent Playwright test code for this flow." Save the output to a file. You won't run it yet — chapter 3 covers turning AI sessions into committed tests — but having a real example to compare against makes that lesson land harder.
Save the chat transcripts. The next chapter unpacks how the snapshot mode you just watched actually works, and when to switch to vision mode instead.