@RepeatedTest for Flakiness Detection

7 min read

A test that passes 95% of the time is not a passing test — it is a flaky test that your team has learned to ignore. The standard debugging approach is to run the test many times in a row and watch for the failure. @RepeatedTest automates that: one annotation, and JUnit runs the test exactly N times and reports each repetition independently. This lesson covers the annotation, its report formatting, and the broader strategy for diagnosing flakiness once you've confirmed it exists.

Basic usage

import org.junit.jupiter.api.RepeatedTest;
import static org.junit.jupiter.api.Assertions.*;
 
class OrderApiTest {
 
    @RepeatedTest(10)
    void shouldConsistentlyReturnOrders() {
        Response response = apiClient.getOrders();
        assertEquals(200, response.getStatusCode());
        assertFalse(response.jsonPath().getList("orders").isEmpty());
    }
}

JUnit runs shouldConsistentlyReturnOrders ten times independently. The report shows ten entries:

shouldConsistentlyReturnOrders() repetition 1 of 10  ✅
shouldConsistentlyReturnOrders() repetition 2 of 10  ✅
shouldConsistentlyReturnOrders() repetition 3 of 10  ✅
...
shouldConsistentlyReturnOrders() repetition 7 of 10  ❌
...
shouldConsistentlyReturnOrders() repetition 10 of 10 ✅

One red entry in ten tells you the test is flaky, when it fails (7th attempt), and that the other nine passed — which immediately rules out systematic infrastructure failure.

Custom display names

The default label is repetition N of total. Override it with the name attribute:

@RepeatedTest(value = 5, name = "Attempt {currentRepetition} of {totalRepetitions}")
void flakinessCheck() {
    Response response = apiClient.checkout();
    assertEquals(200, response.getStatusCode());
}

Report shows: Attempt 1 of 5, Attempt 2 of 5. The placeholder {displayName} inserts the method name — useful when you have several repeated tests in the same class and need to tell them apart in a long report.

Accessing RepetitionInfo

The framework injects a RepetitionInfo parameter if your test method declares it. Use it to vary behaviour per repetition — for example, logging which attempt is running, or adjusting a retry delay:

import org.junit.jupiter.api.RepetitionInfo;
 
@RepeatedTest(5)
void checkSearchConsistency(RepetitionInfo info) {
    System.out.printf("Attempt %d of %d%n",
        info.getCurrentRepetition(), info.getTotalRepetitions());
 
    List<String> results = searchService.query("junit 5");
    assertFalse(results.isEmpty(),
        "Search returned empty results on attempt " + info.getCurrentRepetition());
}

QA use cases

Confirming flakiness. Before spending time investigating a "sometimes fails" test, confirm the flakiness empirically:

@RepeatedTest(20)
void confirmFlakiness() {
    // Run 20 times — if any fail, flakiness is confirmed and reproducible count is known
    WebDriver driver = new ChromeDriver();
    driver.get("https://app.example.com/dashboard");
    assertEquals("Dashboard", driver.getTitle());
    driver.quit();
}

If 3 of 20 fail, you know the flakiness rate is roughly 15%. That number guides how urgently to fix it and whether to add a retry mechanism while the root cause is investigated.

Timing consistency. Combine with @Timeout to check that a service responds within a time budget across multiple calls:

import java.util.concurrent.TimeUnit;
import org.junit.jupiter.api.Timeout;
 
@RepeatedTest(10)
@Timeout(value = 3, unit = TimeUnit.SECONDS)
void shouldRespondWithin3Seconds() {
    Response response = apiClient.getProducts();
    assertEquals(200, response.getStatusCode());
}

If the API has a cold-start problem, one of the ten repetitions will time out and flag the issue. The others reveal whether the problem is intermittent or systematic.

Seed-based randomness. If your system under test uses randomisation (recommendation engines, A/B testing), run the same query multiple times and assert invariants that must hold regardless of the random output:

@RepeatedTest(15)
void recommendationsMustAlwaysBeNonEmpty() {
    List<Product> recommendations = recommendationEngine.getFor("alice");
    assertFalse(recommendations.isEmpty(),
        "Recommendations must never be empty even with randomisation");
    assertTrue(recommendations.size() <= 10,
        "Should cap at 10 recommendations");
}

@RepeatedTest is detection, not a fix

A critical point: @RepeatedTest helps you find flaky tests. It does not fix them. If a test is flaky because of a timing race condition, running it 10 times makes the race condition visible — it does not eliminate it.

The fix depends on the cause:

  • Timing issues → add explicit waits, use Awaitility, or fix the underlying async handling
  • Shared mutable state → isolate each test with @BeforeEach cleanup
  • External service instability → mock the dependency in unit tests, test against a stable environment in integration tests
  • Thread safety → use @ResourceLock (covered in Chapter 4)

Do not ship @RepeatedTest in your main test suite as a permanent fixture unless you have a genuine reason to run something multiple times (performance consistency, randomness invariants). A permanently repeated test that's "there to detect flakiness" is a sign the underlying flakiness was never addressed.

Repetition results at a glance

⚠️ Common mistakes

  • Using @RepeatedTest as a permanent retry mechanism. Writing @RepeatedTest(3) so that a flaky test has three chances to pass hides the flakiness from your report — the test shows as "passed" even though it failed twice internally. Detection and retry are different things. Use @RepeatedTest for detection; use @RetryingTest from a third-party library (or fix the root cause) for retry.
  • Running too few repetitions. A test that fails 5% of the time needs at least 20 repetitions to have a reasonable chance of showing the failure. @RepeatedTest(3) on a 5% flaky test will show as green most of the time. If you're seriously investigating flakiness, use 20–50 repetitions.
  • Forgetting that @BeforeEach and @AfterEach run for every repetition. If @BeforeEach creates a database row and @AfterEach doesn't clean it up, you'll have 10 leftover rows after a @RepeatedTest(10) run. Confirm your lifecycle methods are idempotent with repeated execution.

🎯 Practice task

Use @RepeatedTest to detect and characterise a flaky test. 20–25 minutes.

  1. Write a RandomService class with a method getResult() that returns "success" 80% of the time and "error" 20% of the time (use Math.random() < 0.8 ? "success" : "error").
  2. Write a test that asserts assertEquals("success", randomService.getResult()). Run it once — it probably passes.
  3. Wrap it in @RepeatedTest(20). Run it. Confirm it fails roughly 4 out of 20 times. Read the report to identify exactly which repetitions failed.
  4. Add RepetitionInfo. Print the current repetition and the result on each run. Confirm you can see which specific repetitions produced "error".
  5. Add @Timeout. Wrap a slow getResult() variant (add Thread.sleep(50) inside) with @RepeatedTest(5) @Timeout(value = 100, unit = TimeUnit.MILLISECONDS). Confirm the timing constraint holds across all five repetitions.
  6. Stretch — fix it. Rewrite getResult() to always return "success". Re-run @RepeatedTest(20). Confirm all 20 pass. This models the "detect → fix → verify" workflow.

You have now completed Chapter 3. Next chapter: the Extension model — writing custom extensions that inject parameters, react to test outcomes, and compose cleanly across multiple test classes.

// tip to track lessons you complete and pick up where you left off across devices.