Project Brief — AI-Augmented Test Workflow for an E-Commerce App

10 min read

The capstone for this course is a single end-to-end project that puts every previous chapter to work against one representative scenario. By the end of these three lessons, you'll have a setup document the team can install from, a use-case catalogue with prompts that produce real artefacts, a generated test landed in a real suite, a triaged bug with a regression test attached, a Page Object that matches your team's conventions, a cost projection grounded in actual session data, a security policy your security owner has signed off on, and an adoption plan with measurable success metrics. That bundle is what turns "we tried Playwright MCP" into "we adopted it."

This lesson is the brief — what you're building and why. The next lesson walks one specific bug-to-test cycle as a worked example. The third lesson reviews and points at stretch goals.

The scenario

You are QA Lead at PetMart, an online pet-supplies store. Existing state of the world:

  • 200 Playwright TypeScript tests running in CI on every PR, full suite ~15 minutes.
  • A daily 30-minute bug triage meeting that's mostly "can't reproduce, please add steps" loops with the support team.
  • A growing flake-rate — currently around 4% on the regression suite — that's eroding trust in CI as a merge gate.
  • A small QA team (three engineers including you) who can't keep up with new feature authoring.

Leadership has agreed to a four-week pilot of Playwright MCP. They want to see ROI in measurable terms — hours saved, bugs reproduced, tests authored, flake rate moved — before approving wider adoption. Your job is to design and run that pilot, then write up the case.

The eight deliverables

This is the package you'll have at the end. None of it is busy-work — every item maps to a real adoption blocker that derails most teams.

1. Setup document. A page in the team wiki: how to install Playwright MCP for both Claude Desktop and Claude Code, with the team's exact MCP config, browser flags (--browser=chromium for parity with CI), and the smoke-test prompt every new install should pass. Reference the install lesson; adapt it to PetMart's environment.

2. Use-case catalogue. Five concrete use cases with copy-pasteable prompts. The minimum five:

  • Bug reproduction — vague ticket → reproducible repro + Playwright trace.
  • Exploratory testing of new features — charter format, time-budgeted, severity-graded.
  • Page-Object generation for new pages — match the existing POM style.
  • Flaky-test investigation — verdict on real failure vs flake within 30 seconds.
  • Visual review (staging vs production) — pre-deploy diff with structured findings.

Each entry is a real prompt used during the pilot, not a sketch. The catalogue is what the rest of the team installs from.

3. Generated-test demo. Pick one critical PetMart flow that lacks coverage today — checkout with multiple items and a coupon is a fine choice. Use MCP to drive the flow, generate the equivalent Playwright test, harden against your team's conventions, land in tests/regression/ via PR. The PR description is itself a deliverable: it documents how the test was generated and what was edited before merging.

4. Bug-triage demo. Pick the oldest "cannot reproduce" ticket in the backlog. Use the Chapter 4 reproduction prompt. Either reproduce it (file a triable report and a regression test that captures the bug) or document precisely what was tried and where it diverged from the user's report. Either outcome is a win — the result is unblocked work either way.

5. POM-generation demo. Pick one PetMart page that doesn't yet have a Page Object — the breed-finder filter screen is a representative choice. Reverse-engineer it (Chapter 3, lesson 3), refine to match tests/pages/CartPage.ts (your house style), wire it into one new test. Submit as a PR.

6. Cost analysis. A spreadsheet — or a wiki table — projecting monthly cost across the use cases. For each, record real numbers from the pilot: average tokens per session, average sessions per week, total monthly cost at current Claude pricing. End with a one-paragraph summary: "At projected adoption, MCP costs $X/month and replaces Y hours/month of manual work."

7. Security review. Write the team's MCP security policy following Chapter 5's lesson 3 — which environments, which credentials, which audit trail, which PR-template flag for destructive actions. Get explicit sign-off from whoever owns security at PetMart. The signed-off policy is the deliverable, not the draft.

8. Adoption plan. A one-page rollout: who gets installed in week 1, what training is offered, which two metrics define success (suggested: median bug-repro time and new tests authored per sprint), and what the kill-switch criteria are. "If repro time hasn't moved in 6 weeks, we shut it down" is a healthier plan than open-ended pilot creep.

The capstone at a glance

PetMart MCP Pilot — 4 weeks
  • – Setup doc — Desktop + Code
  • – Use-case catalogue — 5 prompts
  • – Security policy — signed off
  • – Generated test landed in CI
  • – Bug triaged + regression added
  • – POM generated + integrated
  • – Cost analysis — real numbers
  • – Adoption plan — metrics + kill switch
  • MCP in CI on every commit (don't) –
  • Auto-merging AI-generated PRs (don't) –
  • Production credentials in chat (don't) –

Why each deliverable matters

The temptation with a pilot like this is to skip straight to the technical demos and call it done. Every item that isn't a demo is what turns a successful experiment into a successful adoption:

  • Without the setup document, every new team member burns half a day re-discovering the install gotchas.
  • Without the use-case catalogue, "how do I use this for X?" becomes a recurring question instead of a one-prompt-paste answer.
  • Without the cost analysis, the bill arrives unbudgeted and leadership questions the value mid-cycle.
  • Without the security policy, the first incident kills the pilot regardless of how much value it delivered.
  • Without the adoption plan with kill-switch, you end up running an open-ended experiment that's hard to terminate even when the data says you should.

The technical demos prove it works; the framing artefacts prove it can be operated. Adoption requires both.

Suggested pilot timeline

A four-week pilot fits nicely:

  • Week 1 — Setup and the catalogue. Install on your machine. Write the setup doc. Draft the use-case catalogue. Run the security review and get sign-off (this is often the longest-lead item — start it on day one).
  • Week 2 — The three technical demos. Generated test, bug triage, POM generation. One per work-day-and-a-half is realistic. Each ends with a merged PR.
  • Week 3 — Roll out to the rest of the team. Pair with each engineer through their first session. Track friction points and update the catalogue. By the end of the week, every QA engineer has run at least three real sessions.
  • Week 4 — The case. Cost analysis. Adoption plan. Wrap-up presentation to leadership. Decision: full adoption, extended pilot, or wind-down.

Stretch goals (only if the core is on track)

If the core deliverables are on schedule by week 3, three optional extensions:

  • A custom MCP server for PetMart's test data. A small Node service that exposes seed_user, seed_order, reset_database as MCP tools. Generated tests can invoke them directly without per-test setup code. The server is 200–400 lines and pays back across hundreds of test runs.
  • A Slack-bot trigger. "@petmart-qa reproduce BUG-123" fires off a Playwright MCP session and posts the verdict back to the thread. Useful for shifting triage left into the channels where bugs are first reported.
  • A self-healing GitHub Action. When a Playwright test fails in CI on a PR, an action runs the AI-debug prompt from Chapter 4 and posts the verdict as a comment. "Real failure: heading text changed" or "Flake: ran 5 times, 4 pass" — landing in the PR within minutes of the failure.

These extensions are cool and differentiating, but only after the eight core deliverables are solid. A polished pilot beats an extended-but-half-finished one every time.

Reference: where each chapter feeds in

The eight deliverables aren't independent — every previous chapter contributes:

  • Chapter 1 (fundamentals + install) → setup document.
  • Chapter 2 (snapshot, vision, tools) → understanding the calls in your generated artefacts.
  • Chapter 3 (test generation, recording, POM, healing) → demos 3 and 5; informs every prompt in the catalogue.
  • Chapter 4 (exploratory, bug repro, vision verification, debugging) → demos 2 and 4; the bug repro prompt is the highest-ROI one in the catalogue.
  • Chapter 5 (integration, cost, security) → cost analysis, security policy, adoption plan.

The capstone isn't a new skill — it's a consolidation of the existing ones into a single shippable artefact.

What you should have ready before lesson 2

Before you read the next lesson, write down:

  1. The PetMart-equivalent in your real world — your team's name, your existing suite size, your real backlog of "cannot reproduce" tickets, your existing flake rate.
  2. The first vague bug from that backlog you'd like to triage with AI.
  3. The one Page Object you'd most like to reverse-engineer.

The next lesson walks the bug-to-test cycle in detail. Reading it with a real bug in mind — yours, not PetMart's — is what makes the example land. The lesson after that closes the loop with reflection and the stretch goals worth chasing once the pilot lands.

// tip to track lessons you complete and pick up where you left off across devices.