External Data — Excel, CSV, JSON Data Providers

9 min read

Hardcoding test data inside @DataProvider methods works for simple cases, but as soon as the business team needs to add a new scenario, or a QA lead wants to bulk-edit 50 rows without touching Java, the hardcoded approach breaks down. External files solve this: the @DataProvider method becomes a thin reader that loads rows from a CSV, an Excel sheet, or a JSON file. The test logic stays unchanged; the data lives in a format that non-engineers can edit. This lesson covers all three external formats, a centralised DataReader utility that keeps the plumbing in one place, and how to load resources reliably in both local and CI environments.

Reading from CSV

CSV is the simplest external format — a plain text file, one row per test case, columns separated by commas. No dependencies beyond the standard library:

package com.mycompany.tests.data;
 
import org.testng.annotations.DataProvider;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
 
public class CsvDataProvider {
 
    @DataProvider(name = "csvLoginData")
    public Object[][] readLoginCsv() throws Exception {
        List<Object[]> rows = new ArrayList<>();
        // getResourceAsStream reads from the test classpath — works in CI
        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(
                    getClass().getClassLoader()
                              .getResourceAsStream("testdata/login.csv")))) {
            br.readLine(); // skip header row
            String line;
            while ((line = br.readLine()) != null) {
                String[] cols = line.split(",");
                rows.add(new Object[]{
                    cols[0].trim(),                  // email
                    cols[1].trim(),                  // password
                    Integer.parseInt(cols[2].trim()) // expectedStatus
                });
            }
        }
        return rows.toArray(new Object[0][]);
    }
}

src/test/resources/testdata/login.csv:

email,password,expectedStatus
admin@test.com,AdminPass123,200
user@test.com,UserPass123,200
wrong@test.com,BadPass,401
,password,400
admin@test.com,,400

Always use getClassLoader().getResourceAsStream() — not new FileReader("testdata/login.csv"). The latter resolves relative to the working directory, which differs between IntelliJ and CI environments. The classpath-based approach always finds the file because Maven includes src/test/resources on the test classpath.

Reading from Excel with Apache POI

Excel files are popular with QA teams that want a spreadsheet UI for test data. Add Apache POI to your pom.xml (in test scope):

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.5</version>
    <scope>test</scope>
</dependency>

A generic Excel reader utility:

package com.mycompany.tests.util;
 
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
 
public class ExcelReader {
 
    public static Object[][] readSheet(String resourcePath, String sheetName) throws Exception {
        try (InputStream is = ExcelReader.class.getClassLoader()
                                               .getResourceAsStream(resourcePath);
             Workbook workbook = new XSSFWorkbook(is)) {
 
            Sheet sheet = workbook.getSheet(sheetName);
            List<Object[]> rows = new ArrayList<>();
 
            for (int r = 1; r <= sheet.getLastRowNum(); r++) { // row 0 is header
                Row row = sheet.getRow(r);
                if (row == null) continue;
                Object[] cells = new Object[row.getLastCellNum()];
                for (int c = 0; c < row.getLastCellNum(); c++) {
                    Cell cell = row.getCell(c, Row.MissingCellPolicy.RETURN_BLANK_AS_NULL);
                    cells[c] = cell == null ? null : getCellValue(cell);
                }
                rows.add(cells);
            }
            return rows.toArray(new Object[0][]);
        }
    }
 
    private static Object getCellValue(Cell cell) {
        return switch (cell.getCellType()) {
            case NUMERIC -> (long) cell.getNumericCellValue(); // avoid 200.0 → use long
            case BOOLEAN -> cell.getBooleanCellValue();
            default      -> cell.getStringCellValue().trim();
        };
    }
}

DataProvider using the utility:

@DataProvider(name = "excelLoginData")
public Object[][] excelLoginData() throws Exception {
    return ExcelReader.readSheet("testdata/users.xlsx", "LoginTests");
}

The users.xlsx file lives in src/test/resources/testdata/. The LoginTests sheet has columns: Email, Password, ExpectedStatus. Add rows in Excel, run mvn test — TestNG picks up every row automatically.

Reading from JSON with Jackson

JSON is the best format when data has structure (nested objects, typed fields). Add Jackson to pom.xml:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.17.1</version>
    <scope>test</scope>
</dependency>

Define a POJO for the test case:

package com.mycompany.tests.model;
 
public class LoginTestCase {
    private String email;
    private String password;
    private String expectedRole;
    private int expectedStatus;
 
    // getters and setters (or use Lombok @Data)
    public String getEmail()          { return email; }
    public String getPassword()       { return password; }
    public String getExpectedRole()   { return expectedRole; }
    public int    getExpectedStatus() { return expectedStatus; }
}

src/test/resources/testdata/login-tests.json:

[
  {"email": "admin@test.com",  "password": "AdminPass123", "expectedRole": "admin", "expectedStatus": 200},
  {"email": "user@test.com",   "password": "UserPass123",  "expectedRole": "user",  "expectedStatus": 200},
  {"email": "wrong@test.com",  "password": "BadPass",      "expectedRole": null,    "expectedStatus": 401},
  {"email": "",                "password": "password",     "expectedRole": null,    "expectedStatus": 400}
]

DataProvider:

package com.mycompany.tests.data;
 
import com.fasterxml.jackson.databind.ObjectMapper;
import com.mycompany.tests.model.LoginTestCase;
import org.testng.annotations.DataProvider;
import java.io.InputStream;
import java.util.Arrays;
 
public class JsonDataProvider {
 
    @DataProvider(name = "jsonLoginData")
    public Object[][] jsonLoginData() throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        try (InputStream is = getClass().getClassLoader()
                                        .getResourceAsStream("testdata/login-tests.json")) {
            LoginTestCase[] cases = mapper.readValue(is, LoginTestCase[].class);
            return Arrays.stream(cases)
                .map(tc -> new Object[]{
                    tc.getEmail(),
                    tc.getPassword(),
                    tc.getExpectedRole(),
                    tc.getExpectedStatus()
                })
                .toArray(Object[][]::new);
        }
    }
}

A centralised DataReader utility

Keep all file-reading logic in one class so test classes never import file I/O:

package com.mycompany.tests.util;
 
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.*;
import java.util.*;
 
public class DataReader {
 
    public static Object[][] fromCsv(String resourcePath) throws Exception {
        List<Object[]> rows = new ArrayList<>();
        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(
                    DataReader.class.getClassLoader()
                                    .getResourceAsStream(resourcePath)))) {
            br.readLine(); // skip header
            String line;
            while ((line = br.readLine()) != null) {
                rows.add(Arrays.stream(line.split(","))
                               .map(String::trim)
                               .toArray());
            }
        }
        return rows.toArray(new Object[0][]);
    }
 
    public static <T> Object[][] fromJson(String resourcePath, Class<T[]> arrayType) throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        try (InputStream is = DataReader.class.getClassLoader()
                                              .getResourceAsStream(resourcePath)) {
            T[] items = mapper.readValue(is, arrayType);
            return Arrays.stream(items)
                         .map(item -> new Object[]{item})
                         .toArray(Object[][]::new);
        }
    }
 
    public static Object[][] fromExcel(String resourcePath, String sheet) throws Exception {
        return ExcelReader.readSheet(resourcePath, sheet);
    }
}

DataProvider using DataReader:

@DataProvider(name = "csvUsers")
public Object[][] csvUsers() throws Exception {
    return DataReader.fromCsv("testdata/users.csv");
}

Choosing the right format

External data format trade-offs

CSV

  • No dependencies — standard library only

  • Git-friendly — plain text diffs

  • Easy to generate from any system

  • No type info — everything is a string

  • No structure for complex objects

  • Best for simple tabular test data

Excel (.xlsx)

  • Needs Apache POI dependency

  • Non-engineers can edit comfortably

  • Multiple sheets = multiple datasets

  • Binary format — hard to diff in Git

  • Type handling needs explicit code

  • Best when QA team owns the data

JSON

  • Needs Jackson or Gson dependency

  • Typed, structured, nested data

  • Git-friendly — text diffs clearly

  • Maps directly to POJOs

  • Familiar to developers

  • Best for API test payloads and config

⚠️ Common mistakes

  • Using new FileReader("testdata/login.csv"). This resolves relative to the JVM's working directory — the project root in IntelliJ, but potentially target/ in CI. The file is never found in CI and the @DataProvider throws FileNotFoundException. Always use getClass().getClassLoader().getResourceAsStream() — it reads from the classpath regardless of working directory.
  • Forgetting to skip the header row. A CSV @DataProvider that tries to parse email,password,expectedStatus as actual test data will throw a NumberFormatException on Integer.parseInt("expectedStatus"). Always call br.readLine() once before the loop.
  • Storing Excel files in src/main/resources. Maven puts src/main/resources on the production classpath, which is correct if production code reads the file. Test data belongs in src/test/resources — it's on the test classpath only, keeping test infrastructure out of production artefacts.

🎯 Practice task

Load data from all three formats. 35–45 minutes.

  1. Create src/test/resources/testdata/ and add login.csv, users.xlsx, and login-tests.json. Each should have at least 4 rows with email, password, and expected status code.
  2. Write CsvDataProvider, ExcelReader utility, and JsonDataProvider exactly as shown. Run the CSV-backed test first — confirm all 4 rows execute as separate test results.
  3. Add Apache POI to pom.xml and run the Excel-backed test. Open test-output/emailable-report.html and confirm each row appears as a named result.
  4. Add Jackson and run the JSON-backed test. Verify that null values in the JSON (for expectedRole) come through as null in Java.
  5. Test the classpath loading. Replace getResourceAsStream with new FileReader("src/test/resources/testdata/login.csv"). Run mvn test — it may work. Now run mvn test -pl . -Dtest=CsvDataProviderTest from a different working directory — it fails. Revert to getResourceAsStream and confirm both scenarios work.
  6. Stretch — add a DataReader utility that centralises all three formats. Update your three providers to delegate to DataReader. Confirm all tests still pass.

Next lesson: method injection — making one @DataProvider serve multiple test methods by receiving the calling method as a parameter.

// tip to track lessons you complete and pick up where you left off across devices.