How to use Claude Code for QA without breaking your repo
AI coding agents can write tests, scaffold frameworks, and chase down failures fast — and can also flood your repo with plausible, subtly-wrong tests if you let them run unsupervised. Here's how to get the speed without the mess.
An AI agent like Claude Code is genuinely useful for QA work: generating test scaffolding, writing the boilerplate around a test, explaining a failure, drafting cases from a spec. The risk isn't that it can't help — it's that it helps confidently, producing tests that look right, run green, and don't actually verify anything. Used with guardrails it's a real accelerator. Used as autopilot it quietly degrades your suite. The principles below keep you on the right side of that line, and they extend the caution about AI-written tests generally.
Where it genuinely helps
- Boilerplate and scaffolding. Setting up a test file, fixtures, the repetitive structure around a check — fast and low-risk, because it's mechanical.
- Drafting cases from a spec. Hand it acceptance criteria and ask for candidate cases; it's good at enumerating the obvious ones so you can focus on the non-obvious. Treat the output as a first draft.
- Explaining failures and code. "Why is this test flaky?", "what does this function do?" — a fast way to orient in unfamiliar code or a confusing stack trace.
- Repetitive transforms. Migrating a batch of tests to a new pattern, renaming, mechanical refactors — with review.
Where it bites
- Tests that assert nothing. The signature AI failure: a test that runs, goes green, and doesn't actually check the thing — an assertion gap dressed as coverage. It looks like a test and protects nothing.
- Plausible-but-wrong expectations. It invents an expected value that's reasonable and incorrect, baking a bug into the "passing" test. The hallucination problem in test form.
- Confidently bad changes at scale. Let it run unsupervised across the repo and a subtle mistake gets replicated everywhere, fast.
- Bad test design. Brittle selectors, no waiting strategy, tests coupled to implementation — it'll happily produce the patterns you spend your career discouraging.
Using an AI agent on your test repo safely
- Work on a branch, never straight on main — every AI change is reviewed before merge
- Review every generated test like a PR from a junior: does it actually assert the right thing?
- Verify assertions independently — confirm the expected values are correct, not just plausible
- Keep changes small and scoped; don't unleash repo-wide edits unsupervised
- Make it fail first: break the code and confirm the AI's test goes red (catches assert-nothing tests)
- Watch for brittle selectors and missing waits — fix the test-design smells it introduces
- Use it for boilerplate and drafts; keep the judgement (what to test, what "correct" is) yourself
- Commit in small, readable diffs so a bad change is easy to spot and revert
The mindset that keeps the repo safe
Treat the agent as a fast junior who never gets tired and never admits uncertainty. That framing gets both halves right: you delegate the volume work (boilerplate, drafts, mechanical edits) and you review everything before it lands, because a fast junior with no self-doubt is exactly the colleague whose work you'd check. The two non-negotiables are a branch and a review — version control is your safety net, and human review is what separates "the AI 10×'d my test-writing" from "the AI filled my repo with green tests that catch nothing."
The single most valuable check is the failing-test verification: AI tests that pass on the first run are suspicious, because a test you've never seen fail might not be able to. Break the code, watch the test go red, then trust it. Keep the agent for the typing and keep the testing judgement — what's worth checking and what "correct" means — firmly with you, and you get the speed without handing it the keys to your suite's credibility.
// RELATED QA.CODES RESOURCES
Course
Tool
// related
How I evaluate an AI chatbot before release
A practical evaluation pass for AI chat features: hallucinations, refusals, prompt injection, and the cases with no single right answer.
How to review AI-written Playwright tests
AI writes plausible Playwright tests that pass for the wrong reasons. Here is the review checklist that catches them.