Combining Playwright MCP with Existing Test Suites

The mistake most teams make when adopting Playwright MCP is treating it as a replacement for their existing suite. It isn't. The deterministic Playwright tests you already have remain the regression backbone — fast, cheap, gateable in CI. Playwright MCP slots in as a workflow layer on top: exploration, authoring, debugging, triage. This lesson is about how the two cohabit, where MCP fits in the QA day, and the conventions that keep AI-generated artefacts from polluting an otherwise clean suite.

The bumper-sticker version: Playwright Test for execution, Playwright MCP for the work around it. Once that line is clear in your head, the integration questions answer themselves.

The healthy split

Production tests — committed code, deterministic, run in CI on every push. Hand-written or AI-generated, but reviewed and hardened. These are the regression net.
Exploratory and authoring sessions — interactive, AI-driven, never in CI. These produce findings (bugs, repro steps, generated tests) that flow into the production suite as committed artefacts.

The boundary is commit. AI sessions are scaffolding; what survives the review and lands in tests/ is the real test. If you ever find yourself wanting to schedule MCP sessions in CI on every commit, you're holding it wrong — that's where deterministic tests belong.

A representative QA day

A team that's a few weeks into adoption typically uses MCP four or five times a day, in distinct slots:

Morning smoke — point the agent at last night's deploy with a 10-minute exploratory charter. Triage any findings before standup.
Mid-morning triage — pick a vague bug from the backlog. Hand to the agent, get a reproducible repro plus a draft regression test in 15 minutes.
Afternoon authoring — a new page just shipped. Generate a Page Object scaffold, harden against the team's POM conventions, write the first three tests.
Late afternoon CI debug — a flaky test failed twice in CI. Hand the failure to the agent for a real-failure-or-flake verdict before opening the trace yourself.

None of those tasks ran in CI; all of them produced PRs. That's the integration shape that compounds — every session converts a one-off task into a committed artefact the suite will own from then on.

Conventions for AI-generated tests

When you ask the assistant to emit a test, give it the conventions verbatim. The single most effective prompt addition is "match the style of this existing file" with one example pasted in:

Generate this test following our team's conventions:
 
- Page objects in src/pages/, tests in tests/, fixtures in tests/fixtures/
- Use the existing authenticatedPage fixture from tests/fixtures/auth.ts
- Visual regression via await expect(page).toHaveScreenshot()
- Test data only via process.env or the fixtures directory — never hard-coded
- Locators: getByRole, getByLabel, getByText (in that order). No CSS or XPath.
- One assertion per test where reasonable; otherwise group by behaviour.
 
Match the style of this existing test file:
 
[paste tests/checkout.spec.ts here]

The convention list anchors structure; the pasted file anchors taste. Together they produce code that looks like the team wrote it, which is the only way reviewers will trust the diff.

Mandatory review checklist for AI-generated PRs

Treat AI-generated tests like any other PR — except the failure modes are slightly different. Three things to check that you wouldn't always check on a human PR:

Are the selectors stable across data shapes? A test that passes today against a fresh database can fail tomorrow against a populated one. Watch for getByText("Order #12345") and similar data-tied locators.
Are the assertions actually testing the success state, or just visibility of something? "Welcome" visible could be a footer link, a toast, or the real heading. Tighten the oracle.
Is the test self-cleaning? Does it leak users, orders, or settings into the database? AI-generated code often forgets cleanup. Add afterEach teardown explicitly.

Sticking these on a code-review template — "AI-generated? Reviewer confirms locators, oracles, and cleanup" — is the cheapest insurance against AI tests slowly degrading your suite.

When the AI session itself shapes the suite

Sometimes the artefact you commit isn't a test — it's a fixture, a Page Object, or a tooling improvement the AI session surfaced. Three patterns:

Page Objects. A reverse-engineering session (Chapter 3) produces a POM you commit to tests/pages/. Future tests, AI-generated or human-written, import it.
Fixtures. "Add this user / org / dataset to the seeded fixtures" — when the AI session repeatedly recreates the same setup, that's a fixture screaming to exist.
Stable hooks in the app. If the AI session keeps working around a missing data-testid, file an app PR to add the testid. The next session is faster and less brittle.

The principle: every recurring AI cost is a candidate for a one-time investment that retires it. Fixtures, POMs, testids — these are the assets that pay back across hundreds of future sessions.

Where MCP and Playwright Test fit at a glance

QA Workflow Stack

– Runs in CI on every push
– Fast, deterministic, gateable
– Owns regression coverage
– Source of truth for what 'green' means

– Exploratory testing
– Bug reproduction
– Test and POM generation
– CI failure triage

– Generated tests (after review)
– Page Objects from reverse-engineering
– Reproduction tests for bugs
– Fixtures and stable testids

Live AI sessions –
Exploratory charters –
Vision-mode reviews on every commit –

Reference: where this connects to the rest of the course

Everything in the Playwright with TypeScript course remains the source of truth for what good Playwright tests look like — the locator strategy, the fixture model, the trace viewer, the CI integration. This lesson assumes you already know that material; the integration question is purely "where does MCP fit alongside what I already do?"

⚠️ Common mistakes

Trying to run AI sessions in CI on every push. Slow, costly, non-deterministic — every reason that makes MCP great for exploration makes it bad for the hot path. CI runs deterministic Playwright code, full stop. AI sessions belong in interactive workflows or scheduled smoke jobs (daily, not per-commit).
Auto-merging AI-generated tests because they pass. Passing once isn't the same as testing the right thing. AI tests can be confidently green while asserting on the wrong oracle, leaking data into the database, or hard-coding a date that breaks next month. Review every diff with the checklist above.
Letting two parallel test styles emerge. If AI-generated tests follow one set of conventions and human-written tests follow another, your suite splits in two. Anchor every generation to a real existing file, and refactor the AI output toward your standards before merging — never the other way round.

🎯 Practice task

Wire MCP into one real workflow on your team for a week. 60 minutes of setup, one week of practice.

Pick one of the four day-to-day slots above (morning smoke, triage, authoring, CI debug). Just one — adoption fails when teams try to do everything at once.
Write the prompt template that goes with it. Include the team conventions, the example file, the constraints (test environment only, disposable creds, time budget).
Save the template somewhere shared — team wiki, Slack canvas, repo docs/. "Run this prompt against staging when X happens."
Use it for a week. Track elapsed time vs the manual version of the same task. Note what landed in tests/ as a result and what didn't.
At the end of the week, audit: are the AI-generated artefacts indistinguishable from team-written ones in the diff? If not, the prompt needs more example context. Iterate the template before adopting the next slot.
Stretch: add an entry to your team's PR template — "AI-assisted? Y/N. If Y, reviewer confirms locator stability, assertion strength, and cleanup." The metadata becomes a useful signal when bugs surface later.

The next lesson is the cost-and-latency reality check: where AI sessions earn their bill, and where they don't.