The previous lesson started from a written description; this one starts from a demonstration. You walk through a flow in the browser the assistant is watching, then ask for the equivalent Playwright test. It is the AI cousin of npx playwright codegen — but with one important difference: codegen replays clicks, while the assistant infers intent. This lesson covers both pure-AI recording and the hybrid pattern (Codegen first, AI refactor second), plus the limitations that determine which is the right tool for which job.
The honest summary up front: pure-AI recording is excellent for intent capture on shorter flows, and weaker on long, click-dense sequences where exact targeting matters. Codegen is the inverse — pixel-precise on every click, naïve about what the clicks meant. The hybrid pattern combines the strengths.
Pure-AI recording
The setup looks like this:
I'm about to demonstrate our checkout flow. The browser you're driving is the same
one I'll be clicking in. Take a snapshot now, then take another snapshot after I
click around for a few minutes. Pay attention to:
- Pages I visit
- Form fields I fill (and the values)
- Buttons I click
- The final outcome
When I tell you "done", generate a Playwright test that recreates the flow with
proper assertions. Use getByRole/getByLabel locators and no fixed waits.Then you click. The assistant samples the browser state at intervals — typically by calling browser_snapshot periodically — and reconstructs what happened from the deltas between snapshots. New text appearing, a URL changing, an input gaining a value: the assistant infers "the user navigated to /cart, added a Premium t-shirt, opened checkout."
When you say "done", the assistant emits a test that mirrors the inferred flow, including assertions on the post-flow state.
What it sees and what it doesn't
- Sees: state changes between snapshots — page URL, headings, form values, visible text, network requests since the last sample.
- Doesn't see: the exact element you clicked. Only the result of that click is visible. If two paths produce the same end-state, the assistant may pick the wrong one.
- Doesn't see: hover-only interactions. Tooltips, hover-revealed menus, anything gated on
:hoveris invisible unless the hover triggered a snapshot-visible change. - Doesn't see: drag distances or gesture details. "Drag this slider to 75%" — the assistant sees the resulting value, not the gesture; the generated test will use
setValuerather than a drag.
So the rule of thumb: pure-AI recording is excellent when the visible state telegraphs the intent, and degrades when it doesn't. Add this to cart, change quantity to 3, apply coupon — easy. Hover the menu, then move two pixels right and click the second submenu item — fragile.
The hybrid pattern: Codegen + AI refactor
For dense, click-heavy flows, two-step is the better recipe:
npx playwright codegen https://myapp.comA browser opens; you walk the flow; codegen prints raw Playwright code as you click. Stop when the flow is complete. Copy the output.
Then hand it to the assistant:
Here is raw Playwright Codegen output. Refactor it into clean POM-based code:
- Replace CSS-class and nth-child selectors with getByRole/getByLabel/getByText
- Extract the checkout interactions into a CheckoutPage class
- Replace waitForTimeout calls with web-first assertions
- Add explicit assertions on the success state
- Drop any selectors that look auto-generated (data-v-* or hashed class names)
[paste codegen output here]What you've done: used codegen for targeting precision (every click hit the exact element you intended) and the assistant for intent and structure. The output is closer to mergeable than either tool produces alone.
Two recording approaches at a glance
Hybrid recording — describe AND demonstrate
The most expressive variant: describe what you want and demonstrate the parts that are hard to specify.
Watch as I add three specific items to my cart. After I'm done, generate a test
where the items are parameterised so the test runs with different products via
test.each. Use the SKUs from tests/fixtures/products.ts.You demonstrate one concrete pass; the assistant generalises across a fixture. This is the cheapest way to produce a parameterised test from a real session — much faster than writing the parameterisation by hand.
A similar pattern for negative cases:
I'll first demonstrate a successful checkout. Then I'll demonstrate the failure
when I leave the postcode field blank. Generate two tests: the success path and
the validation error case, sharing setup via beforeEach.Two demonstrations, two tests, one fixture — and the assertions for each path are anchored in real text the assistant observed.
When the recorded flow needs explicit narration
Some moments are invisible in snapshots and need to be called out:
- "After I click the avatar, a dropdown appears on hover — capture that interaction."
- "The next page is in a new tab. Switch to it before continuing."
- "There's a confirm dialog after this click. Accept it."
- "This element only appears after a 1-second animation. Wait for the modal heading before targeting anything inside."
The narration sits next to the demonstration in your prompt. The assistant uses it to fill in the gaps the snapshot stream couldn't.
⚠️ Common mistakes
- Demonstrating too fast for the snapshot interval. If you click through five screens in three seconds, the assistant only sees the first and last. Slow down between meaningful state changes, or pause for half a second after each significant click. The samples need time to capture intermediate states.
- Treating the recorded test as final. The assistant's reconstruction is a first draft, not a recording. Selectors may be too generic, assertions too weak, structure too flat. Run the test, review, refactor — same discipline as in the previous lesson.
- Recording flows that depend on volatile data. "Click on order #12345" works once. The next run, that order doesn't exist. Whenever the demonstration touches data you created in the moment, parameterise it before generating the test — or seed the data through fixtures so the test owns its own state.
🎯 Practice task
Try both recording approaches and compare the outputs. 30 minutes.
- Pick a moderately complex flow — 8–12 deliberate clicks. A multi-step form, a checkout, a settings update with validation.
- Pure-AI run: prompt the assistant to watch and record. Demonstrate the flow at a deliberate pace, calling out anything that needs narration. Ask for the generated test. Save it as
tests/ai-recorded.spec.ts. - Hybrid run: open
npx playwright codegenagainst the same URL. Walk the same flow. Copy the raw output, paste into a new chat, and prompt the refactor (POM, role-based locators, web-first assertions). Save the result astests/hybrid-recorded.spec.ts. - Run both tests. Open both files side by side. Note the differences:
- Which one gets the locators right?
- Which one structures the test more cleanly?
- Which one has fewer brittle assumptions about data?
- Stretch: combine the two — take the click-precision of the codegen output and the structure of the AI-recorded version, and produce a third file that's the merge. This is the production artefact you'd actually commit. The earlier two were intermediate work.
The next lesson flips the lens: instead of generating tests, you generate the Page Objects the tests will reference — the reusable layer that pays back across many tests.