Guided Walkthrough Part 1 — GitHub Actions Pipeline

This lesson builds the pr-checks.yml and nightly-regression.yml workflows for the ShopFast pipeline. Every decision is explained — not just what to write but why, and what the alternative would cost. Work through this with your own project files open alongside.

Step 1: The PR checks workflow — structure first

Start with the skeleton. Trigger, two parallel jobs, no test commands yet:

# .github/workflows/pr-checks.yml
name: PR Checks
 
on:
  pull_request:
    branches: [main]
    paths:
      - 'backend/**'
      - 'frontend/**'
      - '.github/workflows/**'
 
jobs:
  backend-smoke:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    defaults:
      run:
        working-directory: backend
 
  frontend-smoke:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    defaults:
      run:
        working-directory: frontend

paths: prevents the workflow from firing on documentation-only changes — a PR that only updates README.md doesn't need CI. defaults.run.working-directory saves repeating cd backend && in every step. timeout-minutes: 10 kills either job if it hangs — 10 minutes is generous for a smoke suite; tighten it once you know actual runtime.

Step 2: Backend smoke job — Java + Maven

  backend-smoke:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    defaults:
      run:
        working-directory: backend
 
    steps:
      - uses: actions/checkout@v4
 
      - uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'
          cache: 'maven'           # caches ~/.m2/repository
 
      - name: Run smoke tests
        run: |
          mvn test \
            -Dgroups=smoke \
            -Dheadless=true \
            -Dsurefire.rerunFailingTestsCount=1 \
            -B
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
 
      - uses: dorny/test-reporter@v1
        if: always()
        with:
          name: Backend Smoke Results
          path: 'backend/target/surefire-reports/*.xml'
          reporter: java-junit
          fail-on-error: false     # report failures; test step already exits non-zero
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: backend-smoke-report
          path: backend/target/surefire-reports/
          retention-days: 7

Three deliberate choices here:

-Dgroups=smoke — runs only the TestNG group tagged smoke. If this group doesn't exist yet, create it: add groups = {"smoke"} to the @Test annotations on your 10 most critical test methods.

-Dsurefire.rerunFailingTestsCount=1 — retries each failing test once before counting it as a failure. A test must fail twice in a row to block the PR. This eliminates most false blocks from infrastructure flakiness without hiding real failures.

fail-on-error: false on the test reporter — the test step already exits non-zero on failures; if the reporter also exits non-zero, GitHub counts it as two failures and the error message is confusing. Let the test step own the failure signal.

Step 3: Frontend smoke job — Playwright

  frontend-smoke:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    defaults:
      run:
        working-directory: frontend
 
    steps:
      - uses: actions/checkout@v4
 
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
          cache-dependency-path: frontend/package-lock.json
 
      - name: Install dependencies
        run: npm ci
 
      - name: Cache Playwright browsers
        uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
          restore-keys: playwright-${{ runner.os }}-
 
      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium
 
      - name: Run smoke tests
        run: npx playwright test --grep @smoke --workers=2 --reporter=html,junit
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
 
      - uses: dorny/test-reporter@v1
        if: always()
        with:
          name: Frontend Smoke Results
          path: 'frontend/results.xml'
          reporter: java-junit
          fail-on-error: false
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: frontend-smoke-report
          path: frontend/playwright-report/
          retention-days: 7

cache-dependency-path scopes the npm cache key to the frontend subdirectory's lock file — important in a monorepo where backend and frontend have separate package-lock.json files.

--workers=2 limits Playwright to 2 parallel browser processes. On a 2-vCPU ubuntu-latest runner, 4 workers would cause CPU contention. 2 workers gives real parallelism without overloading the runner.

--reporter=html,junit generates both formats simultaneously: JUnit XML for the test reporter action, HTML for the artifact.

Step 4: Branch protection

Go to: Repository → Settings → Branches → Branch protection rules → Add rule.

Configure for main:

✅ Require a pull request before merging
✅ Require status checks to pass before merging
- Add: PR Checks / backend-smoke
- Add: PR Checks / frontend-smoke
✅ Require branches to be up to date before merging
✅ Do not allow bypassing the above settings

These four settings guarantee: no direct pushes to main, no merging without passing tests, no merging on a stale branch. Administrators can still bypass in genuine emergencies — the "do not allow bypassing" setting controls whether even admins are gated.

Step 5: Nightly regression workflow

# .github/workflows/nightly-regression.yml
name: Nightly Regression
 
on:
  schedule:
    - cron: '0 2 * * *'    # 2:00 AM UTC every night
  workflow_dispatch:         # manual trigger for emergency reruns
 
jobs:
  backend-regression:
    runs-on: ubuntu-latest
    timeout-minutes: 25
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    defaults:
      run:
        working-directory: backend
 
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'
          cache: 'maven'
 
      - name: Run backend shard ${{ matrix.shard }} of 4
        run: |
          mvn test \
            -DsuiteFile=testng-shard-${{ matrix.shard }}.xml \
            -Dheadless=true \
            -B
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: backend-shard-${{ matrix.shard }}
          path: backend/target/surefire-reports/
          retention-days: 3
 
  frontend-regression:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3]
    defaults:
      run:
        working-directory: frontend
 
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm', cache-dependency-path: frontend/package-lock.json }
      - run: npm ci
      - uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
      - run: npx playwright install --with-deps chromium
      - run: npx playwright test --shard=${{ matrix.shard }}/3 --reporter=blob
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: blob-report-${{ matrix.shard }}
          path: frontend/blob-report/
          retention-days: 1
 
  merge-frontend-reports:
    runs-on: ubuntu-latest
    needs: frontend-regression
    if: always()
    defaults:
      run:
        working-directory: frontend
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm', cache-dependency-path: frontend/package-lock.json }
      - run: npm ci
      - uses: actions/download-artifact@v4
        with: { pattern: blob-report-*, path: frontend/all-blobs, merge-multiple: true }
      - run: npx playwright merge-reports --reporter html ./all-blobs
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: frontend-regression-report
          path: frontend/playwright-report/
          retention-days: 14
 
  notify-failure:
    runs-on: ubuntu-latest
    needs: [backend-regression, frontend-regression]
    if: failure()
    steps:
      - uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ *Nightly regression failed* — ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}",
              "channel": "#qa-builds"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
          SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK

Seven design choices in the nightly workflow worth noting:

workflow_dispatch alongside schedule — lets you rerun the nightly manually after fixing a failure, without waiting until 2 AM.

fail-fast: false on both matrices — a failing shard doesn't cancel the others. You see all failures, not just the first one.

3-day retention on shard artifacts — shards are intermediate; the merged report has 14-day retention. No reason to keep raw shard data long.

The merge step runs if: always() — it runs even if some shards failed. A partial merge still gives you a report for the tests that ran.

notify-failure depends on both regression jobs — Slack fires once when any regression job fails, not once per failing shard.

Step 1 of 6

Step 1: Skeleton workflow

Create pr-checks.yml with two parallel jobs — backend-smoke and frontend-smoke — with path filters and timeouts. No test commands yet. Confirm the file is valid YAML.

Where to pause and verify

After each step, confirm the expected outcome before continuing:

After step 1: push the file to a branch and open a PR. The Actions tab should show the workflow running with two jobs.
After step 2: check the step logs for "Cache restored from key" vs "Cache not found." Measure the setup time. If it's still 90 seconds, the cache configuration is wrong.
After step 3: push a deliberate test failure. Confirm the test reporter posts an annotation directly on the PR. Confirm the artifact is downloadable.
After step 4: try to merge the PR with the failing test. The merge button should be greyed out with "Required status checks have not passed."
After step 5: trigger the nightly manually with workflow_dispatch. Confirm 7 parallel shard jobs appear. Confirm the merge step runs and produces a downloadable HTML report.
After step 6: trigger the nightly with a deliberate failure. Confirm the Slack message arrives within 2 minutes.

The next lesson adds the quality gates, coverage reporting, Allure history, and the deploy-staging workflow.