Back to Blog
On this page8 sections

// tutorial

How to review AI-written Playwright tests

qa.codesqa.codes · 13 June 2026 · 9 min read
IntermediateAutomationAI QA
ai-testingplaywrightreview

AI will hand you a Playwright test that runs green and proves nothing. Reviewing it is the real skill — here is the checklist I use.

part ofTesting AI products

AI is genuinely good at producing Playwright tests. That's the problem. It produces tests that look right, read fluently, and pass on the first run — which makes it very easy to merge a test that asserts nothing meaningful. The skill that matters now isn't writing the test; it's reviewing it with enough suspicion to catch the ways AI-generated tests quietly fail. This is the Playwright-specific companion to AI-generated tests are useful — but not for the reason you think. Here's exactly what I check.

1. Does it assert the thing, or just that something happened?

The most common AI failure: an assertion that's technically true but proves nothing. await expect(page).toHaveURL(/dashboard/) after login confirms a redirect, not that the user is actually logged in or sees their data. expect(response.status()).toBe(200) confirms the call returned, not that it returned the right thing. Ask of every assertion: if the feature were subtly broken, would this test still pass? If yes, the assertion is theatre.

2. Would it actually fail?

The fastest review technique: break the app (or comment out the behaviour) and run the test. An AI test that stays green when the feature is broken is worse than no test — it's a false sense of safety. If I can't quickly convince myself a test would fail for the right reason, I don't trust it. This catches more bad AI tests than reading the code does.

3. The selectors

AI loves brittle selectors — long CSS chains, text matches on copy that changes, nth-child positions. Check it's using stable, intention-revealing locators (roles, labels, test IDs) and not .css-1x9f3 > div:nth-child(3). AI also tends to over-target generated class names that won't exist after the next build.

4. The waiting

Check it leans on Playwright's auto-waiting and web-first assertions, not hard-coded waitForTimeout(3000). AI sprinkles fixed sleeps to make flaky tests pass locally — which is exactly how you get a slow, still-flaky suite. Any literal timeout is a smell to interrogate.

5. Isolation and state

Does the test set up its own data and not depend on a previous test having run? AI frequently writes tests that pass in the order it generated them and fail when run alone or in parallel — shared state, hardcoded IDs that assume a seeded record, no cleanup. Run it in isolation and check it's not quietly coupled to its neighbours. (Good fixtures are the fix.)

6. Is it testing the right thing at all?

Step back from the code: does this test correspond to a real risk, or did the AI test the easy, obvious path because that's what's easy to generate? AI over-produces happy-path tests and under-produces the edge cases, error states, and negative paths where bugs actually live. A folder of twenty AI happy-path tests can have a giant hole in the middle.

The workflow that works

Treat AI as a fast first draft, not a finished test. Generate, then review every test as if a junior wrote it in a hurry: run it, break the app to confirm it fails, fix the selectors and waits, add the negative cases it skipped. The time saved is real — but it's saved on typing, not on judgement. The judgement is still yours, and it's the whole job.

Where this fits

This is AI-for-QA on the test-authoring side. See also the practical Claude/Copilot playbook and, for testing AI products, how I evaluate an AI chatbot before release. The AI for QA hub covers the wider toolkit.

Reviewing an AI-written test

  • Every assertion checks the actual behaviour, not just that "something happened"
  • Confirmed it FAILS when the feature is broken (break it and run)
  • Selectors are stable and intention-revealing, not brittle CSS/nth-child/copy matches
  • Uses auto-waiting and web-first assertions, no hard-coded waitForTimeout
  • Runs in isolation and in parallel — no hidden coupling or seeded-state assumptions
  • Covers the real risk, not just the easy happy path AI defaults to

// RELATED QA.CODES RESOURCES


// related

Tutorials·13 June 2026 · 9 min read

How I evaluate an AI chatbot before release

A practical evaluation pass for AI chat features: hallucinations, refusals, prompt injection, and the cases with no single right answer.

ai-testingllmevaluation
Tutorials·13 June 2026 · 8 min read

What QA should log when testing AI features

A screenshot isn't a repro when outputs vary. Capture the full assembled prompt, retrieved context, model version, and parameters so an AI bug is actually reproducible.

ai-testingobservabilityllm