Debugging Failed Tests with Claude Code

A failing CI test hands you a stack trace, a screenshot, maybe a video, and usually no obvious cause. Debugging it the traditional way means switching between the CI log, the test file, the page object, recent git commits, and your browser — context-switching until a pattern emerges. Claude Code can hold all of that context simultaneously and reason across it. This lesson covers how to structure that investigation.

What makes Claude Code useful for debugging

Claude Code can read everything relevant to a failure in one session: the test file, the page object it calls, the application source, the git log near the broken line, and any log output you paste in. It does not rely on one artefact the way a human squinting at a stack trace does.

That breadth of context is what makes its hypotheses useful. It can say "the label changed from 'Card number' to 'Card details' in commit abc123 three days ago" because it read both the test and the git history — not because it guessed.

The investigation prompt

Structure a debugging request with the error, the relevant files, and a clear ask:

This Playwright test is failing in CI. Investigate and explain — don't fix yet.
 
Error:
  locator.fill: Timeout 30000ms exceeded waiting for locator getByLabel('Card number')
  at tests/checkout/payment.spec.ts:47
 
Read:
- tests/checkout/payment.spec.ts
- src/pages/CheckoutPage.ts
- Recent git history for files touching the checkout flow

Claude reads each file, checks git log -- src/pages/CheckoutPage.ts, and forms a hypothesis. "Don't fix yet" is deliberate — it separates diagnosis from treatment. A wrong fix applied confidently is worse than no fix.

Common failure patterns Claude identifies well

Selector breakage after a UI change. Claude reads the test selector against recent commits and spots the divergence: getByLabel('Card number') — the input's label changed to Card details in the commit that shipped the redesign.

Race conditions. The test asserts a total before the discount API call resolves. Claude finds the missing await or the missing waitFor condition by reading the full test flow.

State pollution from a previous test. Claude reads the beforeEach and afterEach blocks across the file, identifies that a prior test leaves the cart in a non-empty state, and notes that this test does not clear it before starting.

Environment-specific failures. Claude compares the test configuration for local and CI runs, spots that the CI config sets headless: true and flags that some timing assumptions baked into the test do not hold in headless mode.

Pasting log output

For failures you cannot reproduce locally, paste the CI output directly:

This test passes locally and fails in CI about 40% of the time.
 
CI log output:
  ✕ should process refund for cancelled order (12045ms)
  Error: expect(received).toBe(expected)
  Expected: "Refund processed"
  Received: "Processing..."
  [full 50-line log follows]
 
Read tests/refunds/cancel.spec.ts and explain what could cause this locally/CI divergence.

Claude reasons across the log and the code simultaneously — grouping noise, naming the probable issue, pointing at the specific line.

Adding diagnostic instrumentation

For mysterious failures that resist analysis, ask Claude to add temporary logging:

Add console.log statements throughout tests/checkout/payment.spec.ts 
to capture element state at each key step. 
I'll run it and paste the output for diagnosis.

Claude adds targeted logs, you run the test, and the output gives both of you more to work with on the next iteration.

Once the root cause is identified

Fix the issue you identified — the label change on line 47 of CheckoutPage.ts.
Make one focused commit with a clear message describing what changed and why.

A focused fix for a known root cause is fast. The investigation is where time is spent.

CI failure arrivesStack trace, screenshot, log output

Structure the investigationError + relevant files + 'explain before…

Claude reads contextTest file, POM, git history, logs — all…

Hypothesis formedRoot cause named with supporting evidenc…

Human verifiesCheck the hypothesis against your knowle…

Targeted fix appliedFocused change, descriptive commit, CI r…

⚠️ Common Mistakes

Asking Claude to fix before explaining. Confident-but-wrong fixes happen. Always request an explanation first, verify it makes sense for your system, then ask for the fix.
Providing only the error message. "Here's the error, fix it" without the test file, the page object, or any context produces guesses. The more Claude reads, the better the diagnosis.
Trusting the fix without re-running CI. A fix that makes sense locally can still fail in CI if the root cause was an environment difference. Always confirm the fix lands cleanly in the environment where the failure occurred.

🎯 Practice Task

Debug a real failing test using Claude Code. 20–30 minutes.

Find a test in your suite that is currently failing or was recently broken.
Write a Claude Code investigation prompt that includes the error, points at the relevant files, and explicitly says "explain before fixing."
Read Claude's hypothesis. Does it match your intuition about the system?
If the hypothesis is sound, ask Claude to apply the fix with a focused commit.
Note: what did Claude identify correctly? Did it miss anything only you knew about the system?

The next lesson tackles the harder problem — tests that fail intermittently with no consistent error.