Flaky tests are an unavoidable reality in end-to-end automation. Network timeouts, animation timing, third-party service hiccups — these failures say nothing about application quality but they erode trust in the suite. The right response is not to delete the tests or set enabled = false; it's to retry genuine timing failures while keeping consistent logic failures visible as real failures. TestNG's IRetryAnalyzer implements this retry loop, and IAnnotationTransformer extends it to every test in the suite without adding annotation noise. This lesson covers both, the interaction between retry and reporting, and the discipline that separates legitimate retry from masking real bugs.
IRetryAnalyzer — the retry loop
IRetryAnalyzer has one method: retry(ITestResult result). Return true to retry, false to stop:
package com.mycompany.tests.retry;import org.testng.IRetryAnalyzer;import org.testng.ITestResult;public class RetryAnalyzer implements IRetryAnalyzer { private int retryCount = 0; private static final int MAX_RETRIES = 2; @Override public boolean retry(ITestResult result) { if (retryCount < MAX_RETRIES) { retryCount++; System.out.printf( "🔁 Retrying: %s — attempt %d of %d%n", result.getName(), retryCount + 1, MAX_RETRIES + 1 ); return true; } return false; }}
A test annotated with this analyser retries up to 2 times on failure:
@Test(retryAnalyzer = RetryAnalyzer.class, description = "Flaky login test that occasionally times out")public void testLoginWithRetry() { driver().get("https://www.saucedemo.com"); driver().findElement(By.id("user-name")).sendKeys("standard_user"); driver().findElement(By.id("password")).sendKeys("secret_sauce"); driver().findElement(By.id("login-button")).click(); Assert.assertTrue(driver().getCurrentUrl().contains("/inventory.html"));}
If the test fails on the first attempt, retry is called — returns true, test runs again. If it fails again, retry is called — returns true (count = 1 < 2), test runs a third time. If it fails a third time, retry returns false and TestNG records the final failure.
Important:retryCount is an instance field. TestNG creates a new RetryAnalyzer instance per test method, so retryCount starts at 0 for every method.
The retry with state reset
Some tests modify application state that must be reset before a retry makes sense:
public class StatefulRetryAnalyzer implements IRetryAnalyzer { private int retryCount = 0; private static final int MAX_RETRIES = 2; @Override public boolean retry(ITestResult result) { if (retryCount < MAX_RETRIES) { retryCount++; System.out.printf("Retrying %s (attempt %d/%d)%n", result.getName(), retryCount + 1, MAX_RETRIES + 1); // Navigate back to the starting state before retry // The @BeforeMethod will re-run before the next attempt return true; } return false; }}
TestNG's retry runs the full @BeforeMethod → @Test → @AfterMethod sequence on each attempt — so if your setup method navigates to the starting page, each retry begins from a clean state automatically.
IAnnotationTransformer — apply retry to every test
Adding retryAnalyzer = RetryAnalyzer.class to every @Test is noisy and forgettable. IAnnotationTransformer lets you attach it globally at discovery time:
package com.mycompany.tests.listener;import com.mycompany.tests.retry.RetryAnalyzer;import org.testng.IAnnotationTransformer;import org.testng.annotations.ITestAnnotation;import java.lang.reflect.Constructor;import java.lang.reflect.Method;public class RetryTransformer implements IAnnotationTransformer { @Override public void transform(ITestAnnotation annotation, Class testClass, Constructor testConstructor, Method testMethod) { // Attach RetryAnalyzer to every @Test that doesn't already have one if (annotation.getRetryAnalyzer() == null) { annotation.setRetryAnalyzer(RetryAnalyzer.class); } }}
Now every @Test in the suite automatically retries up to 2 times. No per-test annotation required. Tests that consistently fail still show as FAILED after exhausting all retries.
The retry flow
Step 1 of 5
Test fails (attempt 1)
The @Test method throws an AssertionError or Exception. TestNG marks the result as FAILED and calls RetryAnalyzer.retry().
Handling the failed-attempt results
IRetryAnalyzer retries the test, but the first FAILED attempt still exists in TestNG's result set alongside the eventual PASS. This causes two issues: the emailable report shows both results, and your CI system may count the initial failure. Fix with a listener that removes retried-then-passed results:
package com.mycompany.tests.listener;import org.testng.*;import java.util.Iterator;public class RetryResultCleaner implements ITestListener { @Override public void onFinish(ITestContext context) { // Remove the initial failed results for tests that eventually passed Iterator<ITestResult> failedResults = context.getFailedTests().getAllResults().iterator(); while (failedResults.hasNext()) { ITestResult failedResult = failedResults.next(); String methodName = failedResult.getName(); // If this method also has a passed result, the failure was a retry for (ITestResult passedResult : context.getPassedTests().getAllResults()) { if (passedResult.getName().equals(methodName)) { failedResults.remove(); break; } } } }}
Register this alongside RetryTransformer in testng.xml. The suite report then shows only the final state: passed tests that eventually passed, failed tests that failed on all attempts.
When to use retry — and when not to
Use retry for:
Network timeouts during driver.get() or API calls
Animation/rendering timing: element visible but not yet clickable
Third-party auth services that occasionally return 503
CI environment resource contention (CPU-bound browser slowdowns)
Do not use retry to mask:
Assertion failures caused by application bugs
Tests that pass locally but fail in CI due to environment differences (fix the environment, not the retry count)
Tests that always need the second attempt to pass (the test itself is broken)
The discipline: if a test retries consistently — say, 30% of runs need a retry — the test is badly written, not flaky. Fix the root cause.
⚠️ Common mistakes
Setting MAX_RETRIES to a high number like 5. A test that fails 5 times in a row is not flaky — it is broken. A MAX_RETRIES of 2 (3 total attempts) is the widely accepted ceiling. Higher values slow the suite and delay real failure visibility.
Expecting retry to solve race conditions caused by parallel execution. Retrying a test that fails because of a shared WebDriver will sometimes pass (when the race resolves in your favour) and sometimes fail again. Fix the root cause — ThreadLocal<WebDriver> — not the symptom. Retry should never paper over parallelism bugs.
Not cleaning up failed-attempt results. Without RetryResultCleaner, the test-output/emailable-report.html shows a test as both FAILED and PASSED. CI systems that parse TestNG XML may report failures even when the test eventually passed. Always clean up the results set.
🎯 Practice task
Build the retry infrastructure. 25–35 minutes.
Implement RetryAnalyzer with MAX_RETRIES = 2. Apply it to one test via retryAnalyzer = RetryAnalyzer.class. Make the test fail on its first call by checking a counter in a static field, then pass on subsequent calls:
Run. Confirm the console shows the retry message and the test ends as PASSED.
Implement RetryTransformer and register it in testng.xml. Remove retryAnalyzer = RetryAnalyzer.class from the test annotation. Run — the test still retries, proving the transformer wired it globally.
Implement RetryResultCleaner and register it. Run. Open test-output/emailable-report.html — confirm the test appears once as PASSED, not as both FAILED and PASSED.
Test exhaustion. Set the test to always fail (Assert.fail()). Confirm that after 3 attempts the test is recorded as FAILED and retry count appears in the console: "attempt 2 of 3", "attempt 3 of 3".
Measure retry cost. Add Thread.sleep(1000) to the failing test. With MAX_RETRIES = 2, each retry adds 1 second of wasted time per failure. Run the suite and observe the total runtime. This makes the "fix the test, not the retry count" argument concrete.
Stretch — conditional retry. Modify RetryAnalyzer to only retry on java.net.SocketTimeoutException, not on AssertionError. A network timeout gets retried; a wrong assertion does not. This is the most disciplined implementation used in production frameworks.
Next lesson: custom annotations and transformers — IAnnotationTransformer beyond retry, @WIP, @Owner, and IMethodInterceptor for runtime test filtering.
// tip to track lessons you complete and pick up where you left off across devices.