Code Reviews for Test Automation

8 min read

Test code is reviewed as an afterthought in most teams. A developer submits a PR with 200 lines of application code; the reviewer scrutinises every method. The same developer submits a PR with 200 lines of test code; the reviewer checks that it's "testing something" and approves. The result is predictable: test code accumulates the exact same bad patterns that production code reviews are designed to prevent — duplication, brittle selectors, hardcoded credentials, missing assertions, and methods that do five things. By the time the problems compound into a framework crisis, no single PR is clearly responsible. Good test code review is not about being strict; it's about keeping the framework maintainable while transferring knowledge across the team.

Why test code reviews are different — and harder

Application code review has decades of shared vocabulary: SOLID, DRY, naming conventions, cyclomatic complexity. Test code review requires a different and less universally understood lens:

  • Is the test independent? Can it run alone in any order?
  • Is the test focused? Does it verify exactly one behaviour?
  • Are selectors stable? Data attributes outlast CSS classes by months.
  • Are assertions in the test — not hidden in page objects?
  • Is the test readable without comments? Can a new engineer understand it in 30 seconds?

None of these questions appear in typical PR checklists. Adding them explicitly is what separates teams whose test suites age well from teams that rewrite their framework every two years.

The review checklist

A concrete checklist that can be pasted into a PR template:

## Test automation review checklist
 
**Independence**
- [ ] Test creates its own data in setup; does not assume data from other tests
- [ ] Test cleans up in teardown with `alwaysRun = true`
- [ ] Test passes when run alone, in random order, and in parallel
 
**Locators and page objects**
- [ ] Selectors live in page objects, not in test methods
- [ ] `data-testid` / ARIA attributes preferred over class names or XPath
- [ ] No `driver.findElement(...)` or `page.locator(...)` calls in the test layer
 
**Assertions**
- [ ] Assertions live in the test, not in page objects
- [ ] Failure messages are descriptive: `assertEquals(expected, actual, "Cart badge should show item count")`
- [ ] The test verifies the right thing — not just that no exception was thrown
 
**Code quality**
- [ ] No `Thread.sleep()` — explicit waits only
- [ ] No hardcoded URLs, credentials, or environment-specific strings
- [ ] No copy-pasted blocks — shared logic extracted to helpers or fixtures
- [ ] Test name describes the scenario: `adminCanPublishPost`, not `test1`
- [ ] Single behaviour per test — not a 200-line end-to-end journey
 
**Performance**
- [ ] UI setup replaced with API calls where the UI is not being tested
- [ ] No unnecessary page navigations between assertions

Use this checklist as a starting point and adapt it for your team's specific patterns and tools.

Reviewing specific test code problems

The 200-line test method:

// Red flag — a test method that covers an entire user journey
@Test
public void testCompleteUserJourney() {
    // Login (20 lines)
    // Profile update (25 lines)
    // Product search (15 lines)
    // Add to cart (20 lines)
    // Checkout (30 lines)
    // Order confirmation (15 lines)
    // Email verification (20 lines)
}

Review comment: "This test covers 7 independent behaviours. When it fails, we can't tell which step broke without reading 145 lines. Please split into focused tests, each named for the behaviour it verifies. The login step can be a fixture/beforeMethod shared across the other tests."

The hardcoded credential:

def test_admin_dashboard():
    login_page.login("admin@company.com", "AdminPass123!")  # ❌

Review comment: "Credentials hardcoded in test code will be committed to Git history. Use UserFactory.admin() for test credentials and Config.adminPassword() (which reads from an environment variable) for the password. This also makes the test portable across environments."

The assertion-in-page-object:

async submitAndAssertSuccess() {
  await this.submitButton.click();
  await expect(this.page).toHaveURL("/confirmation");  // ❌ assertion in page object
}

Review comment: "expect call belongs in the test, not the page object. Different tests call submit() with different expected outcomes — a success test, a validation error test, a network failure test. Extract to clickSubmit() (returns void) and let the test assert the URL. This makes the page object reusable across all those scenarios."

Rejected PR vs approved PR — same feature, different quality

Rejected — patterns to catch

  • driver.findElement(By.id("submit")).click() in test method

  • Thread.sleep(3000) before every action

  • hardcoded: admin@company.com / AdminPass123!

  • 200-line test covering 6 different behaviours

  • No cleanup — test leaves data in DB

  • Assertion: assertTrue(true) — always passes

Approved — patterns to reinforce

  • loginPage.loginAs(UserFactory.admin()) — page object call

  • wait.until(ExpectedConditions.elementToBeClickable(...))

  • Config.adminEmail() reads from environment variables

  • 3-line test: arrange, act, assert

  • @AfterMethod(alwaysRun=true) cleans created data

  • assertEquals("Order confirmed", page.getHeading(), "Order heading")

How to give useful review feedback

Be specific, not general. "This test is not following best practices" is not actionable. "The Thread.sleep(3000) on line 47 should be replaced with wait.until(ExpectedConditions.visibilityOf(successBanner)) — this eliminates the fixed 3-second delay and makes the test fail faster when the banner genuinely doesn't appear" is actionable.

Explain the cost, not just the rule. "Selectors should be in page objects" is a rule. "If this selector is hardcoded in the test, and the div's class name changes next sprint, this test breaks and you won't know until the CI run — moving it to CheckoutPage means the selector is in one place and all tests using it get the fix automatically" is a cost explanation. Engineers follow rules better when they understand why the rule exists.

Distinguish blocking from non-blocking. A missing alwaysRun = true on teardown will cause zombie processes in CI — that's blocking. A test name that could be more descriptive is a suggestion. Label your comments clearly: [blocking], [suggestion], [question]. Most review tools support this natively.

Pairing on framework changes

When a PR touches BaseTest, BasePage, DriverManager, or core utilities, synchronous review is more valuable than async comments. Changes to framework foundations affect every test — a subtle breakage in BasePage.waitFor() can manifest as 50 different test failures. Two engineers reviewing together can reason through concurrency, edge cases, and downstream impact more effectively than a solo reviewer scanning for typos.

Knowledge transfer through reviews

Reviews are the most efficient mechanism for spreading framework knowledge across a team. When a senior engineer explains why alwaysRun = true is required on teardown in a PR comment, every engineer who reads that thread learns the rule. When the same engineer explains it verbally in a meeting, three people hear it and one remembers it.

Document patterns discovered through reviews: "we see Thread.sleep being added frequently → add it to the checklist" or "we keep approving tests without cleanup → make cleanup a blocking checklist item." The checklist is a living document built from the patterns you actually catch in reviews.

⚠️ Common mistakes

  • Approving test PRs quickly to "not slow down shipping." A bad pattern approved once gets copy-pasted by every engineer who sees it. The 30 seconds saved by skipping review costs 10 hours of refactoring in three months. Test code review is not optional work — it's framework maintenance.
  • Only reviewing assertions, not structure. Reviewers naturally focus on whether the test is testing the right thing. Framework structure violations — locators in tests, setup not using the factory, missing cleanup — are invisible unless you specifically look for them. Use the checklist explicitly.
  • Reviewing test code only when it fails. Post-failure review is damage control. Pre-merge review is prevention. By the time a pattern is causing failures, it's already been copy-pasted across the codebase.

🎯 Practice task

Conduct a structured test code review — 35 minutes.

  1. Review your own last test PR. Open the last test code you committed. Apply the checklist from this lesson to it. How many items would it fail? Be honest — this is a self-assessment, not a performance review.
  2. Fix the blocking items. From step 1, fix any items that are blocking: add cleanup, move selectors to page objects, replace Thread.sleep. Commit the fixes. Run the suite — it should still pass.
  3. Create a PR template. Add the test automation checklist to your project's .github/pull_request_template.md (or equivalent for your hosting platform). The checklist appears automatically on every new PR.
  4. Review a teammate's test. Ask a teammate to send you a test PR for review. Apply the checklist. Write at least one specific, cost-explaining comment per blocking item. Discuss the comments together — the conversation is as valuable as the fixes.
  5. Stretch — flaky test review. Pick the flakiest test in your suite. Review it as if it's a new PR. Apply the checklist. Identify which checklist failures are causing the flakiness (missing cleanup? No explicit wait? Shared state?). Fix the identified items and run the test 10 times consecutively to confirm the flakiness is resolved.

Next lesson: scaling from 50 to 5000 tests — the infrastructure, tagging, and execution strategies that keep a large suite fast and maintainable.

// tip to track lessons you complete and pick up where you left off across devices.