Prompt Engineering for Test Automation

The single skill that separates engineers who get a lot from AI from engineers who get a little is prompt engineering. It's not glamorous and it's not magic — it's a small set of habits that turn vague output into useful output. This lesson collects the principles that matter most for test automation, with reusable templates you can copy into your own work.

Principle 1 — Be specific about format

Vague prompts produce vague code.

Bad:
"Write a Playwright test for login"
 
Good:
"Write a Playwright TypeScript test for login. Place it in
tests/auth/login.spec.ts. Use the existing LoginPage class (pasted below).
Include 1 happy path and 3 negative scenarios. Use getByRole locators.
Output as a complete spec file."

The good prompt names the framework, the language, the file path, the conventions, the scenario count, the locator strategy, and the output format. Each constraint reduces the model's freedom to do something you didn't want.

Principle 2 — Provide context

The model doesn't know your codebase. Show it.

Paste an existing test as a "follow this style" example.
Reference your team's conventions explicitly: "We use getByTestId locators, never CSS selectors."
Include the relevant page object source, helper functions, and config.

Without context, the model invents conventions. With context, it follows yours.

Principle 3 — Chain of thought for complex tasks

Don't ask for the final artefact in one go.

Bad:
"Generate all tests for the checkout flow"
 
Good:
"Step 1: list the test scenarios needed for checkout.
 Step 2: prioritise them by risk.
 Step 3: write the top 5 as Playwright tests."

Asking for the plan first lets you correct the plan before any code is generated. This sounds like extra work; it saves time.

Principle 4 — Specify edge cases up front

If you don't ask, you'll get happy-path-only.

"Include tests for: empty inputs, very long inputs (1000 chars),
Unicode characters (emoji, RTL scripts), SQL injection attempts,
XSS payloads, very high quantities (999999), zero quantities,
negative quantities, decimal quantities."

Three lines of edge-case enumeration in the prompt produces a far stronger test set than the model would generate on its own.

Principle 5 — Ask for explanations

"Generate the test AND explain why you chose those particular assertions."

This serves two purposes: it teaches you (a free senior-engineer pair-programming session), and it surfaces flawed reasoning. If the explanation doesn't make sense, the test probably doesn't either.

Principle 6 — Iterate

The first draft is rarely perfect. Refine in turns.

- "This test uses CSS selectors. Refactor to use getByRole instead."
- "Add a teardown that deletes the user created in beforeAll."
- "The negative scenarios are too generic — make them specific to OAuth flows."

Each round narrows the gap. Three small iterations almost always beat one giant prompt.

A reusable Playwright test prompt template

Save something like this in your team's prompt library and adapt for each new feature:

PROMPT TEMPLATE — Generate Playwright Test
 
Context: [describe the feature in 2-3 sentences]
Existing POM (paste full source):
  [paste]
Test framework conventions:
  - Use POM pattern
  - Locators: getByRole / getByLabel / getByTestId only
  - Fixtures live in fixtures/
  - Use test.step() blocks for multi-step flows
  - Test data setup: createUser(), createOrder() helpers
 
Generate a Playwright TypeScript test file that:
  - Tests scenarios: [list scenarios]
  - Includes setup/teardown for any data created
  - Uses descriptive test names
  - Includes assertions for: [list assertions]
  - Adds JSDoc comments above any non-obvious logic
 
Output the complete spec file. Then explain the assertion choices.

Weak prompt vs strong prompt — what changes

What you put in, what you get out

Weak prompt

Vague intent — 'write a login test'
No framework / language specified
No conventions or examples provided
No edge cases enumerated
Result: generic happy-path test, wrong locator style, hallucinated APIs

Strong prompt

Specific intent — 'Playwright TS test for OAuth login, 1 happy + 5 negative scenarios'
Existing POM and helpers pasted as context
Conventions stated — locator style, fixtures, naming
Edge cases listed up front
Result: usable test fitting your codebase, clear assertions, easy review

Anti-patterns to avoid

Vague prompts. "Make this better" produces aimless changes. Say what "better" means.
No examples. The model fabricates conventions when none are provided.
One-shot expectation. Real workflows iterate. The model gets closer to right with each turn.
No review. Quality decays silently when nobody checks the output. Make review a non-negotiable step.
Reinventing every prompt. If you've written it three times, it's a template.

Prompts for the MCP world

The same principles apply when the AI is driving a real browser via something like Playwright MCP — but the prompt is now describing a flow rather than asking for code. Be explicit:

"Open the staging site at https://staging.example.com.
 Sign in as test+admin@example.com / Password123!.
 Navigate to Orders. Find an order in 'pending' status.
 Click 'Refund'. Confirm the dialog. Verify the order status
 changes to 'refunded' within 5 seconds. If anything looks
 unexpected at any step, stop and report."

The "if anything looks unexpected, stop and report" line is doing real work — it turns the loop from "execute blindly" into "execute and observe."

Building team-wide prompt practice

Keep good prompts in a shared repo (a prompts/ folder is enough).
Tag prompts by purpose: test-generation, test-data, bug-triage, code-review.
Periodically review the library — drop prompts that no longer work, refine ones that nearly do.
Pair-prompt occasionally. Two engineers prompting together is a remarkably effective way to level up.

⚠️ Common Mistakes

Sending the prompt without re-reading it. Half the bad output is the prompt's fault, not the model's. Read it once before sending.
Pasting too much context. Context windows are large but not infinite, and signal gets diluted. Paste the relevant POM, not the entire repo.
Skipping the iteration step. First-draft acceptance is the single biggest source of mediocre AI-generated tests.
No prompt library. Re-inventing the same prompt each week is the prompt-engineering equivalent of not using version control.

🎯 Practice Task

45 minutes.

Pick a feature you'd write tests for this sprint.
Write a "weak" prompt — minimal context, vague intent. Save the output.
Now rewrite as a "strong" prompt using all six principles. Save that output.
Compare side by side. Note specifically what the strong prompt got right that the weak one missed.
Save the strong prompt as a template. Adapt for your next test next week.

Chapter 3 moves from authoring tools to the AI-powered test platforms — self-healing, visual AI, and AI-augmented recorders.