Test Reports in CI — JUnit XML, HTML, Allure

A CI pipeline that runs tests and silently exits with a red light is incomplete. The failure message, the stack trace, the screenshot of the broken UI, the trend showing this test has been flaky for three weeks — that information is what turns a red build into a fixed bug. Test reporting bridges the gap between "the build failed" and "here's exactly what went wrong and where."

Three formats for three audiences

Test reporting in CI serves two distinct audiences: machines and humans. The formats they need are different.

JUnit XML is for machines. Your CI server reads it to determine pass/fail counts, track trends, and annotate pull requests. It's not readable by humans, and that's fine — it's not meant to be.

HTML reports are for humans in a hurry. A developer who sees a failing PR status check clicks through to the report, finds the failure in seconds, reads the assertion message, and opens the failing test file. The report answers "what failed and why" without requiring access to CI logs.

Allure is for humans who need depth. It adds history across builds, failure categories, test timeline visualisation, behaviour-driven views (when @Epic/@Feature annotations are used), and per-test attachments. It's the right choice for teams that review test health as a regular practice.

JUnit XML — the universal CI format

JUnit XML is generated automatically by every major test framework and CI tool knows how to read it:

<testsuite name="LoginTest" tests="5" failures="1" errors="0" time="12.5">
    <testcase classname="LoginTest" name="testValidLogin" time="2.3"/>
    <testcase classname="LoginTest" name="testInvalidLogin" time="1.8">
        <failure message="Expected error message not displayed">
            org.opentest4j.AssertionFailedError: Expected error...
        </failure>
    </testcase>
</testsuite>

Maven Surefire writes this to target/surefire-reports/ automatically. pytest writes it when you pass --junitxml=results.xml. Jest writes it with --reporters=default --reporters=jest-junit.

In GitHub Actions, parse JUnit XML and annotate the PR with individual test failures:

- uses: dorny/test-reporter@v1
  if: always()
  with:
    name: Test Results
    path: '**/surefire-reports/*.xml'
    reporter: java-junit
    fail-on-error: true

The dorny/test-reporter action reads the XML and renders a formatted test summary on the PR — test counts, failure messages, and links to the failing test — without requiring anyone to download a ZIP file.

In Jenkins, the built-in JUnit plugin does the same thing in post { always { } }:

post {
    always {
        junit 'target/surefire-reports/*.xml'
    }
}

One line. Jenkins adds a trend graph, a per-test history view, and marks the build UNSTABLE if any tests fail.

HTML reports in CI

HTML reports need to be hosted somewhere accessible — they're useless as a raw ZIP on a developer's laptop.

GitHub Actions artifact (simplest):

- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: playwright-report
    path: playwright-report/
    retention-days: 14

The report appears as a downloadable ZIP on the workflow run page. Simple, always available, no infrastructure needed.

Jenkins HTML Publisher: (covered in Chapter 3) serves the report directly from Jenkins with a sidebar link.

GitHub Pages (permanent, shareable URLs): useful for Allure reports that should be accessible without Jenkins or GitHub Actions access:

- uses: peaceiris/actions-gh-pages@v4
  if: always()
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    publish_dir: allure-history
    destination_dir: reports/${{ github.run_id }}

Each run gets a unique URL. Link to it from the Slack notification or PR comment so anyone can access it without GitHub Actions access.

Allure in CI

Allure is worth the extra setup for teams running more than ~50 tests, because the trend history and failure categorisation are invaluable for identifying flakiness:

- name: Run tests
  run: mvn test -B
 
- name: Generate Allure report
  uses: simple-elf/allure-report-action@master
  if: always()
  with:
    allure_results: target/allure-results
    allure_history: allure-history
    keep_reports: 20
 
- name: Deploy to GitHub Pages
  uses: peaceiris/actions-gh-pages@v4
  if: always()
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    publish_dir: allure-history

The keep_reports: 20 parameter retains the last 20 builds' history on the GitHub Pages site. Each new run updates the trend graph with the latest results. A test that's been failing for 4 builds in a row is much easier to spot in Allure's history view than in individual CI logs.

For Playwright, set the Allure reporter in playwright.config.ts:

reporter: [
  ['allure-playwright', { outputFolder: 'allure-results' }],
  ['html', { open: 'never' }]
],

Both reporters run simultaneously — Allure results for CI history, HTML for quick local review.

Test report formats: right tool for the right audience

JUnit XML

Audience: CI servers, tooling
Machines parse it — humans don't read it
Generated by: Surefire, pytest, Jest
Zero extra config in most frameworks
Used by: Jenkins, GitHub Actions, GitLab CI
Universal format — every CI tool reads it
Shows: pass/fail counts, timing, error messages
No screenshots, no history, no trends

HTML Reports

Audience: developers debugging failures
Click through from failed status check
Generated by: Playwright, Cucumber, ExtentReports
Framework-specific format and quality
Hosted via: artifact download or Jenkins sidebar
Available for the lifetime of the artifact
Shows: test list, failure details, screenshots
No cross-build history or trend data

Allure

Audience: QA team reviewing test health
Best for regular test health reviews
Generated by: allure-testng, allure-playwright
Requires adding a dependency + annotation
Hosted via: GitHub Pages or Allure TestOps
Permanent URL with build history
Shows: trends, timelines, flaky test detection
Cross-build history is the key advantage

⚠️ Common mistakes

Only uploading reports on failure. A report from a green run has value — it confirms exactly what passed, documents the test run for auditing, and provides baseline data for flakiness detection. Use if: always() for report uploads, not if: failure().
Not publishing Allure history. Allure without history is just a fancy HTML report. The trend graph, flakiness detection, and failure categorisation only appear once you preserve results across builds. The keep_reports parameter and the GitHub Pages deploy step are not optional extras — they're what makes Allure worth using.
Expecting JUnit XML to replace readable failure messages. The XML records the failure. The human-readable test code and assertion message are what explain it. Write clear assertion messages (assertThat(status).as("Order status after checkout").isEqualTo("CONFIRMED")) so the XML failure element contains something actionable, not just AssertionError.

🎯 Practice task

Set up all three report types for your project — 40 minutes.

JUnit XML: confirm your framework generates it (target/surefire-reports/ for Maven, junit-results.xml for pytest). Add dorny/test-reporter@v1 to your GitHub Actions workflow with if: always(). Push a failing test and confirm the failure appears as a PR annotation.
HTML artifact: add actions/upload-artifact@v4 pointing at your HTML report directory. Push. Download the artifact from the workflow run page and open it.
Allure: add the Allure dependency for your framework, run tests, confirm target/allure-results/ is populated. Add the Allure GitHub Action and deploy to GitHub Pages. Open the published URL.
Stretch: make a test fail on two consecutive runs. Open the Allure history view. Confirm the failure appears across both runs in the trend graph.

The next lesson turns reporting into enforcement: quality gates that block merging when the criteria you set aren't met.