Quality Gates and Failing Builds on Test Failure

A test suite that runs in CI but doesn't block merging when it fails is a suggestion, not a gate. Teams that have "tests run in CI but developers can still merge" inevitably develop a culture of ignoring red pipelines — the signal erodes until nobody trusts the suite at all. Quality gates are the mechanism that converts test results from advisory to mandatory: a failing gate means the code cannot merge, full stop.

What quality gates are

A quality gate is a pass/fail check that the pipeline evaluates before a PR can merge. The gate can check any measurable criterion: all tests pass, code coverage is above 80%, no critical security vulnerabilities, no SonarQube issues above severity threshold. If any configured gate fails, the PR is blocked until the failure is resolved.

The value is automatic consistency. Without gates, each PR's merge decision is made by a human reviewing CI results — which means it depends on the reviewer's attention, time pressure, and willingness to push back. With gates configured in the repository settings, the same standard applies to every PR, every time, regardless of who's reviewing.

How tests fail builds in GitHub Actions

When a test step exits with a non-zero code, GitHub Actions marks the step as failed. The job fails. The workflow fails. The PR status check shows a red X. If that workflow is configured as a required check in branch protection, the PR is blocked.

This chain works automatically for most frameworks — Maven exits non-zero when mvn test has failures, npx playwright test exits non-zero when tests fail, pytest exits non-zero on failures. You don't need extra configuration to get the basic "failed tests block the PR" behaviour.

What you do need to configure explicitly: the branch protection rule.

Configuring required status checks

Repository → Settings → Branches → Add rule (or edit existing rule for main)
Enable Require status checks to pass before merging
Search for and add your workflow job names (e.g., selenium, playwright, cypress)
Optionally enable Require branches to be up to date before merging — this prevents a PR from merging if main has advanced since the PR's checks ran
Save

From this point, every PR must show green checks for all listed jobs before the merge button is active. A failed test anywhere in the chain blocks the merge.

Custom quality gates

Beyond test pass/fail, you can encode any measurable standard as a gate:

Test pass rate threshold (reject if more than 5% of tests fail):

- name: Evaluate test pass rate
  run: |
    TOTAL=$(python3 -c "
    import xml.etree.ElementTree as ET, glob
    files = glob.glob('target/surefire-reports/*.xml')
    total = sum(int(ET.parse(f).getroot().get('tests', 0)) for f in files)
    print(total)
    ")
    FAILED=$(python3 -c "
    import xml.etree.ElementTree as ET, glob
    files = glob.glob('target/surefire-reports/*.xml')
    failed = sum(int(ET.parse(f).getroot().get('failures', 0)) + int(ET.parse(f).getroot().get('errors', 0)) for f in files)
    print(failed)
    ")
    RATE=$(( (TOTAL - FAILED) * 100 / TOTAL ))
    echo "Pass rate: ${RATE}%"
    if [ "$RATE" -lt 95 ]; then
      echo "::error::Pass rate ${RATE}% is below the 95% threshold"
      exit 1
    fi

Coverage minimum (covered in the next lesson — JaCoCo enforces this via mvn jacoco:check).

SonarQube quality gate:

- name: SonarQube analysis
  run: mvn sonar:sonar -Dsonar.projectKey=my-project
  env:
    SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
 
- uses: sonarsource/sonarqube-quality-gate-action@master
  timeout-minutes: 5
  env:
    SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

The SonarQube gate fails the workflow if the SonarQube project's quality gate (configured in the SonarQube server) is not met. The quality gate on the server can check: code coverage, duplicated code ratio, maintainability rating, reliability rating, and security hotspots.

Handling flaky tests gracefully

A known-flaky test that fails 15% of the time creates a dilemma: make it a required gate and it blocks PRs randomly; leave it ungated and it has no value. The right approaches:

Retry on failure (don't let a single flaky test fail the gate):

# Playwright
- run: npx playwright test --retries=2
 
# Maven Surefire
- run: mvn test -Dsurefire.rerunFailingTestsCount=2 -B

With retries, a test must fail 3 consecutive attempts before the step fails. Genuinely flaky tests (failure rate < 30%) usually pass on retry. Consistently broken tests fail on all retries and correctly block the PR.

Quarantine tag — tag flaky tests with @flaky and exclude them from the required gate job. Run them in a separate optional job so the failures are visible but don't block merging:

jobs:
  required-gate:
    runs-on: ubuntu-latest
    steps:
      - run: npx playwright test --grep-invert @flaky   # excludes flaky tests
 
  flaky-watch:
    runs-on: ubuntu-latest
    continue-on-error: true                              # doesn't block PR
    steps:
      - run: npx playwright test --grep @flaky

The flaky watch job surfaces the failure without blocking the merge. The intent is to fix the flaky test — the separate job provides visibility without friction.

PR openedDeveloper pushes branch, opens PR

CI workflow triggersTests, coverage, and security scans run…

Test gateAll tests pass (or pass rate ≥ 95% with…

Coverage gateLine coverage ≥ 80% (JaCoCo or Istanbul)

Quality gateSonarQube gate: no new critical issues

Merge allowedAll required checks green — merge button…

Calibrating strictness

The strictest possible gate — 100% test pass rate, 100% coverage, zero lint warnings — sounds ideal until it paralyses the team. Gates that fire on legitimate work teach developers to work around them (force-push to a different branch, get a quick approval, bypass the check). Once the team learns to route around a gate, it provides negative value — false confidence and friction.

The practical calibration: start with one gate (all tests pass on the smoke suite), enforce it consistently for two weeks, measure whether it's catching real issues and whether it's creating false blocks. Expand gates gradually once the team trusts the first one.

⚠️ Common mistakes

Configuring required checks without enforcing "branches must be up to date." A PR can pass all checks on Monday, sit unmerged until Friday, and then merge — ignoring everything that merged to main in between, including a test that would now catch a conflict. Enable "Require branches to be up to date" alongside required checks.
Making flaky tests required gates without retries. A test with a 20% failure rate blocks 1 in 5 PRs for no real quality reason. Either fix the flaky test, add retries, or quarantine it — don't leave it as a required gate in its current state.
Adding gates without owners. A SonarQube gate that fires needs someone to review and resolve it. If nobody is assigned to review SonarQube issues, the gate fails every build, the team learns to ignore it, and it provides no value. Every gate needs an owner and a process for resolution.

🎯 Practice task

Configure a complete quality gate setup — 30 minutes.

Confirm your test workflow exits non-zero on test failure (run a deliberate failing test locally and check the exit code: mvn test; echo $? or npx playwright test; echo $?).
Add branch protection to your test repository: require your test workflow's job as a status check. Try to merge a PR with a failing test — confirm the button is greyed out.
Add --retries=1 (Playwright) or -Dsurefire.rerunFailingTestsCount=1 (Maven) to your test command. Find a flaky test (or deliberately add Math.random() > 0.5 → fail) and confirm it passes on retry without blocking the PR.
Stretch: create a second job in your workflow with continue-on-error: true. Move a flaky or slow test into it. Confirm the PR can merge even when this secondary job fails.

The next lesson adds quantitative measurement to your quality gate: code coverage reporting and the threshold checks that enforce it.