Guided Walkthrough and Review — Claude Code for QA

This lesson walks through a complete capstone implementation, phase by phase — the actual prompts, what Claude Code produced, where corrections were needed, and what the finished project looks like. The goal is not a perfect walkthrough but an honest one. Understanding where Claude Code excels and where it needs your judgement is the most transferable skill in this course.

Phase 1: Scaffold (Day 1)

Target: saucedemo.com. Product login, product listing, add to cart, checkout.

Start with project setup:

> Set up a Playwright TypeScript project.
> Create: src/pages/, tests/smoke/, tests/negative/, tests/visual/, tests/fixtures/
> Configure Allure HTML reporter.
> Generate GitHub Actions workflow at .github/workflows/tests.yml — push to main triggers run.
> Generate CLAUDE.md with our conventions.
> Show me everything before writing any tests.

What Claude Code produced well: the folder structure, playwright.config.ts, package.json with sensible scripts, a complete tsconfig.json, and a .github/workflows/tests.yml that was mostly correct. The Allure configuration needed one fix — Claude used a slightly outdated package name (@playwright/test reporter alias); the correct Allure integration for Playwright uses allure-playwright.

What needed correction: the CLAUDE.md draft was too generic — Claude wrote "use best practices" without knowing this project's specific conventions. Spent 10 minutes replacing generic entries with specifics: saucedemo.com's login credentials, the fact that the checkout flow has three distinct steps, and the convention to use data-test attributes (saucedemo uses these consistently).

Lesson: the scaffold is the highest-leverage place to invest review time. Everything else is built on top of it.

Phase 2: Smoke tests (Day 2)

With the scaffold in place:

> Visit https://www.saucedemo.com and analyse the login page.
> Generate a Playwright test for the login flow.
>
> Save to tests/smoke/login.spec.ts.
> Create src/pages/LoginPage.ts — use getByTestId where possible 
>   (saucedemo uses data-test attributes).
> Cover: successful login redirects to /inventory.html.

What worked: Claude generated LoginPage.ts with accurate selectors (getByTestId('username'), getByTestId('password'), getByTestId('login-button')). The test was clean, readable, and passed on first run.

Then for the remaining 4 smoke tests:

> Following the same patterns as login.spec.ts and LoginPage.ts:
> Generate tests for:
> 1. Product listing — visit /inventory.html, verify at least 6 products visible
> 2. Add to cart — add 2 items, verify cart badge shows 2
> 3. Checkout — complete checkout flow with valid user details
> 4. Logout — log in, log out, verify redirected to login page
>
> Create the corresponding Page Objects in src/pages/.
> Save tests to tests/smoke/.

What worked: three of the four tests were production-ready after one review pass. The checkout test needed intervention — Claude modelled the three-step checkout as a single page interaction when it is actually three separate pages (/checkout-step-one.html, /checkout-step-two.html, /checkout-complete.html). Claude did not know this from the URL alone.

Fix prompt:

> The checkout test needs correction. The checkout flow is:
> 1. Click checkout on /cart.html
> 2. Fill form on /checkout-step-one.html (first name, last name, postal code)
> 3. Click continue → /checkout-step-two.html (order summary)
> 4. Click finish → /checkout-complete.html
> Update CheckoutPage.ts and the test to model these four distinct steps.

Lesson: Claude Code does not know your application's navigation model unless you tell it. For multi-step flows, describe the steps explicitly.

Phase 3: Negative tests (Day 3)

> For each smoke test, add 2 negative scenarios. Examples:
> - Login: invalid password, locked-out user account
> - Add to cart: remove item immediately, cart shows 0
> - Checkout: missing required fields, browser back mid-checkout
>
> Save to tests/negative/ — separate files from the smoke tests.
> Use the same Page Objects.

What worked: the locked-out user scenario was accurate — saucedemo has a locked_out_user credential that Claude correctly identified as the mechanism. Invalid password assertions were correct. The "missing required fields" test was solid.

What needed correction: the "browser back mid-checkout" scenario. Claude wrote a test that used page.goBack() and then asserted the cart still contained items. The actual behaviour on saucedemo is that navigating back mid-checkout discards cart state — the test needed to verify the cart was empty, not still full. This was a domain knowledge gap that only real-app testing revealed.

Workflow that caught this: after generation, the review prompt:

> Review tests/negative/ for assertion correctness. 
> For each test: does the assertion match the application's actual behaviour?
> I'll verify your assessment against the real site.

Claude flagged that the back-navigation assertion was speculative. Verified against the real application, corrected accordingly.

Lesson: the review-your-own-work prompt catches speculative assertions before they become misleading green tests.

Phase 4: Visual and cross-browser (Day 4)

> Add 3 Playwright snapshot tests for key pages: login, inventory, order complete.
> Save to tests/visual/.
> Add cross-browser config to playwright.config.ts — run login.spec.ts 
>   and checkout.spec.ts on Chromium and Firefox.

This phase was the fastest. Playwright snapshot tests are mechanical — Claude generated them correctly. The cross-browser config addition to playwright.config.ts was accurate and required no changes.

One issue: snapshot tests fail the first time they run because they create the baseline. The README needed a note explaining npx playwright test --update-snapshots for first-time setup. Claude did not add this; added it manually.

Phase 5: CI, commands, documentation (Day 5)

> Create .claude/commands/ with:
> - run-smoke.md: run tests/smoke/, summarise failures
> - generate-test.md: guided test generation flow
> - check-coverage.md: look at tests/ and identify features with no coverage
>
> Generate README.md, CONTRIBUTING.md.
> Update CLAUDE.md with everything we learned about saucedemo's structure.

The three slash commands were well-generated and immediately useful — the /check-coverage command in particular surfaced two features (sorting and filtering on the inventory page) that the test suite had not covered. Added smoke tests for both.

README and CONTRIBUTING were solid first drafts that needed 15 minutes of editing to make them accurate and specific rather than generic.

Reflection: what this project revealed

Where Claude Code excelled: scaffolding (config, CI, folder structure), Page Object generation (especially with accurate data-test attributes on saucedemo), bulk test generation that matched an established pattern, documentation first drafts.

Where Claude Code needed correction: multi-step navigation flows without explicit step description, domain-specific assertions (what happens in edge cases that only become clear from running the real app), and any assumption about application behaviour that Claude had to guess.

Time estimate: the core deliverables took approximately 4 days of part-time work. Manual equivalent estimate for the same suite: 12–15 days. The gap was largest in scaffolding (1 hour vs 2–3 days) and Page Object generation (2 hours vs 2–3 days). The smallest gap was in negative test design — understanding what edge cases matter required domain knowledge that was the same cost either way.

Cost: approximately $18 in API tokens across the project at Sonnet pricing. Covered by a Pro subscription in practice.

Step 1 of 6

Day 1 — Scaffold

Generate project structure, config, CI workflow, CLAUDE.md. Review everything carefully. Correct the CLAUDE.md to be project-specific.

Skills built through this project

By completing the capstone you have practised every technique in this course under real-world constraints:

Prompting for codebase-aware generation — not just "write a test" but "read this, follow that, cover these scenarios"
Reviewing AI output critically — the assertion correctness check that catches passing-but-wrong tests
Building project context — a CLAUDE.md that actually reflects the project, not a generic template
Custom workflow tooling — slash commands that the team can use, not just you
Knowing when to correct and when to accept — the judgement that separates productive AI pairing from uncritical acceptance

Where to go from here

Apply it to your actual work. Pick the lowest-covered area of a real project's test suite and use Claude Code to close the gap. The first real application is where the skills consolidate.

Experiment with MCP servers. Playwright MCP for live page analysis, GitHub MCP for CI integration, database MCPs for fixture generation from real records. Each one expands what Claude Code can do in a session.

Share the patterns with your team. Build a shared .claude/commands/ library. Review each other's CLAUDE.md files. The team-level compounding of these practices is where the biggest productivity gains live.

Stay current. The MCP ecosystem, Claude Code features, and the underlying models are all moving fast. Check the Claude Code changelog quarterly and re-evaluate your CLAUDE.md and slash commands as the tooling evolves.

The core skill — pairing human domain knowledge and review judgement with AI speed and breadth — will transfer to whatever comes next.