Why Playwright MCP — Use Cases for QA

Playwright MCP is the official MCP server from the Playwright team at Microsoft. It gives any MCP-aware AI assistant — Claude Desktop, Claude Code, IDE plugins — the ability to drive a real browser. The premise is simple: instead of writing Playwright code by hand, you describe what you want in plain English, and the assistant performs the steps. This lesson is about where that premise pays off and where it doesn't, so the rest of the course lands on solid ground.

The headline shift: AI doesn't replace your existing Playwright suite. It augments the workflow around it. The deterministic regression suite still runs in CI. Playwright MCP picks up the long tail of work that is hard to script ahead of time — exploration, reproduction, scaffolding, ad-hoc smoke checks. Keeping that line clear is how teams adopt this without breaking what already works.

Six QA scenarios where it earns its keep

Exploratory testing. A charter-style prompt — "visit the site, sign up, browse three products, add one to cart, attempt checkout, report anything weird" — turns into an autonomous session where the AI drives, observes, and writes a structured report. You get human-style charter testing at machine speed, with an audit trail of every click.
Bug reproduction. Given a written ticket, the assistant walks the steps in a real browser, captures the failing snapshot and network traffic, and emits a minimal failing Playwright test ready for the developer. Triage time collapses from I'll repro it later to here is the failing test, attached.
Test scaffolding. "Generate a Playwright test for the login flow with valid and invalid credentials" produces a first draft tied to the real DOM rather than imagined selectors. You still review and refactor the code, but the typing-out-the-skeleton step disappears.
Visual reviews. "Compare the staging homepage to production and list visual differences" uses screenshots plus reasoning to flag layout regressions and copy changes. Cheaper than standing up a dedicated visual-diff platform for one-off reviews.
Self-healing selectors. When a button moves and an existing test breaks, the assistant can re-snapshot the page, locate the renamed element by role and accessible name, and propose the corrected locator — with a human approving the diff.
Documentation. "Walk through the checkout flow and write a markdown guide with screenshots" turns the source-of-truth (the live app) into the source-of-truth for docs in one pass. Especially useful for onboarding pages and runbooks that drift fast.

Each of these has the same shape: the work is conversational, hard to specify ahead of time, and benefits from a human in the loop reviewing the output. That is the AI-augmented QA sweet spot.

Where it does not belong

High-volume regression suites. Per-step tool calls take seconds and cost tokens. A 1,200-test suite running on every commit would be slow and expensive — and worse, non-deterministic. Use AI to create fast deterministic tests, not to be the fast deterministic tests.
Performance testing. AI sessions add their own latency on top of the system under test, so the numbers are meaningless for SLA work. Reach for k6 or JMeter (covered in their own qa.codes courses).
Anything safety-critical without review. AI output is a draft, not a verdict. Code reviews, healed selectors, and exploratory findings all need a human checkpoint before they affect production or main.

Saying no to those use cases is what makes the yes list above defensible.

Codegen vs Playwright MCP

Playwright already ships a recorder — npx playwright codegen — which watches you click and emits Playwright code. It is fast and surprisingly good. Why bring AI into the picture at all?

Manual coding vs Codegen vs Playwright MCP

Hand-written Playwright tests

Slow to author the first draft
Maximum precision and control
Deterministic, fast in CI
The right home for regression suites

Codegen recorder

Fast to capture a flow
Generated code is brittle if selectors change
No understanding of intent — pure replay
Good seed for a draft, not a finished test

Playwright MCP

Describe in English, AI drives and writes
Reasons about role, name, accessibility tree
Costs tokens and seconds per step
Best for exploration, repro, scaffolding

The honest difference: codegen replays clicks; Playwright MCP understands. "Click the submit button" still works after a developer renames the underlying element, because the assistant resolves the target from the live accessibility tree at the time of execution. The trade-off is cost, latency, and non-determinism — all of which matter in CI but are perfectly acceptable in interactive sessions.

How this fits with your existing Playwright knowledge

Everything in the Playwright with TypeScript course still applies. The locators you already use (getByRole, getByLabel, getByTestId) are the same locators the AI will produce in its generated code. The configuration in playwright.config.ts — projects, base URL, fixtures — still drives execution when those generated tests run in CI. Playwright MCP doesn't introduce a parallel framework; it sits one layer up.

If you have not yet written a Playwright test by hand, build that fluency first. AI-generated code is much easier to review when you can spot anti-patterns at a glance.

⚠️ Common mistakes

Treating Playwright MCP as a CI replacement. The pitch "let the AI just run our tests" lands badly: per-step latency, model cost, and non-determinism make it a poor fit for the hot path. Keep the deterministic suite for CI; use AI for the work where flexibility matters more than speed.
Skipping the human review step on generated code. Output that looks like a clean Playwright test can hard-code timing, miss assertions on side effects, or codify a UI bug as expected behaviour. Review the diff with the same rigour you would a human PR — more, if anything, because the model has no stake in correctness.
Adopting it without telling the team. AI sessions touch real browsers under real accounts. Without a shared norm — which environments, which credentials, who owns the costs — surprises follow. Pick a target environment (staging), a shared test account, and a place to track usage before the first session.

🎯 Practice task

Pick your first two Playwright MCP scenarios. 15 minutes, paper exercise.

Open your team's bug tracker and your test backlog. Pick one bug that would take 20+ minutes to manually reproduce, and one feature that lacks Playwright coverage today.
For each, write the natural-language prompt you would hand to Playwright MCP — the same way you'd brief a junior tester. Be specific: starting URL, credentials to use, what counts as success or failure, what artefacts you want back.
List the deterministic tests you'd want to derive from each AI session once it succeeds. Those are the artefacts that go into the regression suite; the AI session is the throwaway scaffold that produces them.

Keep both prompts. You'll run them in lesson 4 of this chapter, after the install lesson. The prompts you wrote for today's state of the codebase are exactly the kind of work this tool is for.