Your First AI-Driven Browser Session

You have Playwright MCP installed and the host can list its tools. This lesson is about driving your first real session and understanding what happens behind every prompt. By the end you'll have run two non-trivial flows, watched the AI orchestrate Playwright tools step by step, and built an honest mental model of where this tool fits in your QA day. The point isn't to be impressed — it's to see exactly what is happening so you can debug it later when something doesn't.

Keep the browser visible for these first sessions (skip --headless). Watching the assistant click, type, and re-snapshot in real time is the fastest way to internalise the loop.

A first prompt — and what it really does

Open a fresh chat in Claude Desktop (or a Claude Code session) and paste:

Navigate to https://playwright.dev/, then click "Get started" and tell me what page you land on.

A real browser window opens, the homepage loads, the click happens, and the assistant replies with the destination page name. Trivial output, but the round trip behind it is the whole protocol in miniature:

Claude reads your message and notices it needs browser tools.
Claude calls browser_navigate with { "url": "https://playwright.dev/" }. The Playwright MCP server launches Chromium and navigates.
Claude calls browser_snapshot. The server returns the page's accessibility tree — a structured list of headings, links, buttons, each with a unique ref.
Claude scans the tree and finds an element with role link and accessible name Get started. It calls browser_click with that element's ref.
The link follows. Claude calls browser_snapshot again, reads the new page's title and primary heading, and writes a sentence summarising it.

In Claude Desktop you can expand each tool call inline to see exactly what was sent and what came back. Get into the habit of doing this — it is the single best debugging surface you have. When a session does something unexpected, the answer is almost always in the tool-call payloads, not in the chat narrative.

A more interesting prompt

The single-click flow is fine for a smoke test but doesn't show the model thinking. Try:

Go to https://demo.playwright.dev/todomvc, add three todos: "Buy groceries", "Pay rent", "Call mom".
Mark "Pay rent" as complete. Then tell me how many active todos remain.

What you'll see in the tool-call panel:

browser_navigate to the TodoMVC URL.
browser_snapshot — the assistant locates the input field by role.
Three rounds of browser_type (each followed by a browser_press_key for Enter to commit the todo) and a browser_snapshot between rounds so the assistant sees the new list state.
browser_click on the checkbox next to Pay rent, identified by its accessible name in the latest snapshot.
Final browser_snapshot to read the "2 items left" footer.
A natural-language reply: "Two active todos remain — Buy groceries and Call mom."

No code was written. No selectors were authored. The accessibility tree carried enough structure for the assistant to reason its way through.

What the loop actually looks like

Step 1 of 6

You write a prompt

Plain English describing the goal — start URL, what to do, what to report. The more precise the goal and the success criteria, the better the run.

A reality check before you fall in love

The first session feels magical. Three guardrails to anchor expectations before you start designing workflows around it:

It is slow. Each tool call adds roughly one to two seconds. A 12-step flow takes 20–30 seconds at minimum, before any model thinking time. Fine for one-off exploration; not fine for a 1,200-test CI suite.
It costs tokens. Snapshots, screenshots, and tool results all flow back into the model's context. A long session can run several thousand tokens per turn. Budget accordingly, especially with vision mode (chapter 2).
It is non-deterministic. Same prompt, same app, slightly different paths. Usually the answer is identical; occasionally the model takes an alternate route. Acceptable for exploration, not acceptable for regression tests. The fix is to convert successful sessions into deterministic Playwright code (chapter 3) — which is exactly what this tool is designed to feed into.

Internalise these three constraints and the where it belongs question answers itself. Use it for the work where flexibility outweighs determinism, and use generated code for everything else.

Reading the tool-call panel

Two payloads are worth opening on every session, especially while learning:

The first browser_snapshot after a navigation. You see exactly how the assistant sees the page — every interactive element with its role, name, and ref. If a later click fails, this is where you check whether the target was even visible.
Any browser_click or browser_type. The arguments include the ref (which element) and, for typing, the text. Comparing the ref against the snapshot tells you whether the assistant aimed at the right element. Wrong target failures are far more common than target moved failures.

You'll lean on this view constantly through the rest of the course.

⚠️ Common mistakes

Vague prompts that omit success criteria. "Test the checkout" gives the assistant nowhere to stop and no way to report meaningfully. "Go to /checkout, add a Standard t-shirt to the cart, complete checkout with the test card 4242…, and report whether the order confirmation page appears within 10 seconds" gives a clear goal, an oracle, and a deadline. Specificity is the dial that controls quality.
Assuming first-run success means production-ready. A flow that worked once in your hands can fail tomorrow on a different account, with different inventory, on a slower network. Treat the first successful session as a seed — promote it to a deterministic Playwright test (or at least a written charter) before relying on it.
Ignoring the tool-call panel. The narrative reply is just the result; the tool calls are the evidence. When something feels off — "did it really verify that?" — the answer is in the payloads, not in the prose. New users skip this step and end up trusting summaries that overstate what happened.

🎯 Practice task

Run two real sessions and read the evidence trail. 25 minutes.

Run the TodoMVC prompt above. Verify the response. Then expand at least three tool calls in the chat and read the actual JSON arguments and results. Note the structure of one browser_snapshot reply — that accessibility tree is the assistant's sole view of the page in default mode.
Pick one of the prompts you wrote in the previous lesson's practice task (your real bug or your real feature). Paste it into a fresh chat against your staging environment with disposable credentials. Watch the session run end to end with the browser visible.
Note one thing the assistant did well and one thing it missed or got wrong. Refine the prompt to address the miss — usually by adding a success oracle, a concrete URL, or a specific assertion you want back. Re-run.
Stretch: ask the assistant at the end of the session to "emit the equivalent Playwright test code for this flow." Save the output to a file. You won't run it yet — chapter 3 covers turning AI sessions into committed tests — but having a real example to compare against makes that lesson land harder.

Save the chat transcripts. The next chapter unpacks how the snapshot mode you just watched actually works, and when to switch to vision mode instead.