Data-Driven Tests with TestNG Data Providers

9 min read

The previous chapters wrote one test method per scenario. That works until you realise you've written testCreateUserAlice, testCreateUserBob, testCreateUserWithEmptyName, testCreateUserWithBadEmail — eight near-identical methods diverging only in the input row and expected status. TestNG's @DataProvider is the canonical fix: one test method, many data rows, one result per row in the report. The Selenium with Java chapter on data-driven Selenium tests showed the pattern for UI tests; it's identical for API tests, just with given()/when()/then() in the body. The API Testing Masterclass lesson on positive/negative/edge case design is the strategy this lesson is the Java tooling for.

A first data provider

The shape: a method returning Object[][] (a "table" — outer array is rows, inner array is columns), wired to a test by name.

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
 
public class CreateUserTests extends BaseApiTest {
 
    @DataProvider(name = "userCreationData")
    public Object[][] userCreationData() {
        return new Object[][] {
            // { name,           email,             role,         expectedStatus }
            { "Alice Smith",     "alice@test.com",  "admin",      201 },
            { "Bob Jones",       "bob@test.com",    "tester",     201 },
            { "",                "charlie@test.com","tester",     400 },   // empty name
            { "Dave",            "not-an-email",    "tester",     400 },   // malformed email
            { "Eve",             "eve@test.com",    "superadmin", 400 },   // invalid role
            { "Frank",           "",                "tester",     400 },   // empty email
        };
    }
 
    @Test(dataProvider = "userCreationData")
    public void createUserCases(String name, String email, String role, int expectedStatus) {
        CreateUserRequest req = new CreateUserRequest(name, email, role);
 
        given().spec(Specs.admin)
            .body(req)
        .when()
            .post("/users")
        .then()
            .statusCode(expectedStatus);
    }
}

Six rows, six test executions, six lines in the report — each named with the row's data so a failure points at the exact bad combination. One method's worth of code, six tests' worth of coverage.

Why this beats six methods

Three concrete wins:

  • One change site. Adding Accept: application/json to the request body once updates all six tests. With separate methods, it's six edits.
  • Coverage at a glance. The data table is itself a coverage document — anyone reading the file sees the six scenarios laid out side by side. Six method names hide that pattern.
  • Easy to extend. Add a row to the array, get a new test. No cut-and-paste of the body.

The trade-off: data providers can over-collapse if you cram too many different scenarios into one method. The rule: one data provider per behaviour, not per resource. A "create user — happy and validation errors" provider is right; a "create user — and login — and update — and delete" provider is a smell.

Reading the failure output

When row 3 fails ("" for name), TestNG's report shows:

FAILED: createUserCases("", "charlie@test.com", "tester", 400)
java.lang.AssertionError: Expected status code <400> but was <500>

The test name is the data. No mystery about which row broke — and the assertion message tells you the API returned 500 instead of the expected 400 (probably an uncaught NPE; a bug worth filing).

Loading data from JSON

When the table grows past 10 rows, inline Object[][] becomes painful to read. Move it to a JSON file under src/test/resources/testdata/:

[
  { "name": "Alice", "email": "alice@test.com", "role": "admin",  "expectedStatus": 201 },
  { "name": "Bob",   "email": "bob@test.com",   "role": "tester", "expectedStatus": 201 },
  { "name": "",      "email": "x@test.com",     "role": "tester", "expectedStatus": 400 }
]

Define a small POJO that matches the row shape, deserialise with Jackson, project to Object[][]:

@Data @NoArgsConstructor @AllArgsConstructor
public class UserTestCase {
    private String name;
    private String email;
    private String role;
    private int expectedStatus;
}
 
@DataProvider(name = "userCreationFromJson")
public Object[][] userCreationFromJson() throws IOException {
    UserTestCase[] cases = new ObjectMapper().readValue(
        new File("src/test/resources/testdata/user-creation.json"),
        UserTestCase[].class);
 
    return Arrays.stream(cases)
        .map(c -> new Object[]{ c.getName(), c.getEmail(), c.getRole(), c.getExpectedStatus() })
        .toArray(Object[][]::new);
}

The non-developer wins: a tester who isn't comfortable in Java can edit the JSON file to add scenarios. The deserialised POJO carries types — no string-to-int parsing in the data provider, just a clean projection.

Loading from CSV or Excel

For QA teams used to spreadsheets, an Excel/CSV provider often goes down better than JSON. The Apache POI library reads Excel; OpenCSV reads CSV. The Selenium with Java course built a reusable ExcelReader you can drop straight into a Rest Assured project:

@DataProvider(name = "userCreationFromExcel")
public Object[][] userCreationFromExcel() throws IOException {
    return ExcelReader.readData("src/test/resources/testdata/user-creation.xlsx", "UserTests");
}

The data provider is a one-line wrapper; the heavy lifting is in ExcelReader. CSV is similar:

@DataProvider(name = "userCreationFromCsv")
public Object[][] userCreationFromCsv() throws IOException {
    try (var reader = new CSVReader(new FileReader("src/test/resources/testdata/user-creation.csv"))) {
        List<String[]> rows = reader.readAll();
        rows.remove(0);   // strip header
        return rows.stream()
            .map(r -> new Object[]{ r[0], r[1], r[2], Integer.parseInt(r[3]) })
            .toArray(Object[][]::new);
    }
}

CSV's downside is the explicit type parsing; JSON deserialisation handles types for you. Pick based on who's editing the file.

Designing the data set

A good provider has intentional coverage — every row exists for a reason. The categories worth including, every time, for any input-bearing endpoint:

  • Happy paths (the main 201/200 cases) — at least one per significant variant.
  • Boundary values — empty strings, max-length strings, zero, one, very large numbers.
  • Type confusion — strings where numbers go, numbers where strings go (where the API contract says strings).
  • Special characters — Unicode (café), emoji, quotes, backslashes, SQL-injection-y input ('; DROP TABLE).
  • Missing fields — null where required, missing where optional.
  • Forbidden values — invalid enum values ("superadmin" when only admin/tester/viewer allowed).
  • Format violations — malformed emails, bad UUIDs, unparseable dates.

A six-row provider is rarely enough; twelve to fifteen rows is closer to the right shape.

One method, many test runs

Each row is its own TestNG result. A regression that breaks empty-name validation lights up exactly row 3 — and the matrix as a whole shows which behaviours the provider covers.

Parallel data rows

TestNG can run rows in parallel:

@Test(dataProvider = "userCreationData", threadPoolSize = 4)
public void createUserCases(...) { ... }

Four threads, four concurrent rows. The catch: each row's data must be unique (a duplicate email between two rows running together becomes a 409 race). The factory patterns from Chapter 6 (UUID-suffixed emails) are what make this safe.

Naming rows for readability

By default, TestNG names a row by its values. For long rows, that becomes unreadable. The ITestContext.getName() and IDataProvidable hook lets you supply a name explicitly — but the simpler win is a separate "label" column in the data:

return new Object[][] {
    { "happy_admin",      "Alice", "alice@test.com", "admin",      201 },
    { "happy_tester",     "Bob",   "bob@test.com",   "tester",     201 },
    { "empty_name_400",   "",      "x@test.com",     "tester",     400 },
    // ...
};

The label is the first parameter — your test method ignores it (it's purely for the report), and the result reads createUserCases("empty_name_400", ...) which is far easier to scan in a CI log than createUserCases("", "x@test.com", ...).

⚠️ Common mistakes

  • Cramming too many concerns into one provider. A provider for "create user and login and delete" is a workflow, not a data set. Workflows belong in their own test methods. Providers feed one behaviour with multiple inputs.
  • Forgetting unique data for parallel runs. Two rows with the same email running concurrently produce intermittent 409 conflicts that look like flakes. Always include a per-row UUID or timestamp in fields with uniqueness constraints.
  • Asserting the same thing for every row. If half your rows expect 201 and half expect 400, the test must take the expected status as a parameter. A test that asserts a fixed status is just six near-identical happy-path tests with extra steps.

🎯 Practice task

Build a real, useful data provider against REQRES. 30 minutes.

  1. Create CreateUserDataProviderTests.java with a @DataProvider returning at least six rows: two happy paths, four validation failures (empty name, missing email, invalid role, very long name).
  2. Run. Confirm TestNG reports six test results, one per row, with the row's values in the test name.
  3. Move data to JSON. Create src/test/resources/testdata/create-user.json with the same rows. Build a UserTestCase POJO. Switch the data provider to read the JSON. Confirm the same six results.
  4. Add a label column. Prepend each row with a snake_case scenario name (happy_admin, empty_name_400). Note how the test report becomes readable.
  5. Stress the API. Add 10 more rows with edge cases: Unicode names, very long emails, leading/trailing whitespace, SQL-injection-style strings. Run; note which (if any) fail. File a bug for any that produce 500 instead of 400.
  6. Parallel rows. Change @Test to @Test(dataProvider = "...", threadPoolSize = 4). Make sure each row's email is unique (use UUID.randomUUID() in the JSON or in a small post-processing step). Run. Confirm tests still pass.
  7. Pull data from CSV. Convert the JSON to a CSV with the same columns. Wire OpenCSV into the data provider. Run; confirm parity with the JSON version.
  8. Stretch: add a second provider for update tests that depend on a setup created by the test class. The setup creates one user in @BeforeClass; the provider rows describe how to update it. Note that this couples rows to a shared resource — discuss when that's worth it (rarely) vs. when each row should be independent (almost always).

Next lesson: managing test data — builders, factories, and the cleanup hooks that keep a long-running suite from polluting its environment.

// tip to track lessons you complete and pick up where you left off across devices.