This lesson builds the pr-checks.yml and nightly-regression.yml workflows for the ShopFast pipeline. Every decision is explained — not just what to write but why, and what the alternative would cost. Work through this with your own project files open alongside.
Step 1: The PR checks workflow — structure first
Start with the skeleton. Trigger, two parallel jobs, no test commands yet:
# .github/workflows/pr-checks.yml
name: PR Checks
on:
pull_request:
branches: [main]
paths:
- 'backend/**'
- 'frontend/**'
- '.github/workflows/**'
jobs:
backend-smoke:
runs-on: ubuntu-latest
timeout-minutes: 10
defaults:
run:
working-directory: backend
frontend-smoke:
runs-on: ubuntu-latest
timeout-minutes: 10
defaults:
run:
working-directory: frontendpaths: prevents the workflow from firing on documentation-only changes — a PR that only updates README.md doesn't need CI. defaults.run.working-directory saves repeating cd backend && in every step. timeout-minutes: 10 kills either job if it hangs — 10 minutes is generous for a smoke suite; tighten it once you know actual runtime.
Step 2: Backend smoke job — Java + Maven
backend-smoke:
runs-on: ubuntu-latest
timeout-minutes: 10
defaults:
run:
working-directory: backend
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: 'maven' # caches ~/.m2/repository
- name: Run smoke tests
run: |
mvn test \
-Dgroups=smoke \
-Dheadless=true \
-Dsurefire.rerunFailingTestsCount=1 \
-B
env:
BASE_URL: ${{ secrets.STAGING_URL }}
- uses: dorny/test-reporter@v1
if: always()
with:
name: Backend Smoke Results
path: 'backend/target/surefire-reports/*.xml'
reporter: java-junit
fail-on-error: false # report failures; test step already exits non-zero
- uses: actions/upload-artifact@v4
if: always()
with:
name: backend-smoke-report
path: backend/target/surefire-reports/
retention-days: 7Three deliberate choices here:
-Dgroups=smoke — runs only the TestNG group tagged smoke. If this group doesn't exist yet, create it: add groups = {"smoke"} to the @Test annotations on your 10 most critical test methods.
-Dsurefire.rerunFailingTestsCount=1 — retries each failing test once before counting it as a failure. A test must fail twice in a row to block the PR. This eliminates most false blocks from infrastructure flakiness without hiding real failures.
fail-on-error: false on the test reporter — the test step already exits non-zero on failures; if the reporter also exits non-zero, GitHub counts it as two failures and the error message is confusing. Let the test step own the failure signal.
Step 3: Frontend smoke job — Playwright
frontend-smoke:
runs-on: ubuntu-latest
timeout-minutes: 10
defaults:
run:
working-directory: frontend
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: frontend/package-lock.json
- name: Install dependencies
run: npm ci
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
restore-keys: playwright-${{ runner.os }}-
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run smoke tests
run: npx playwright test --grep @smoke --workers=2 --reporter=html,junit
env:
BASE_URL: ${{ secrets.STAGING_URL }}
- uses: dorny/test-reporter@v1
if: always()
with:
name: Frontend Smoke Results
path: 'frontend/results.xml'
reporter: java-junit
fail-on-error: false
- uses: actions/upload-artifact@v4
if: always()
with:
name: frontend-smoke-report
path: frontend/playwright-report/
retention-days: 7cache-dependency-path scopes the npm cache key to the frontend subdirectory's lock file — important in a monorepo where backend and frontend have separate package-lock.json files.
--workers=2 limits Playwright to 2 parallel browser processes. On a 2-vCPU ubuntu-latest runner, 4 workers would cause CPU contention. 2 workers gives real parallelism without overloading the runner.
--reporter=html,junit generates both formats simultaneously: JUnit XML for the test reporter action, HTML for the artifact.
Step 4: Branch protection
Go to: Repository → Settings → Branches → Branch protection rules → Add rule.
Configure for main:
- ✅ Require a pull request before merging
- ✅ Require status checks to pass before merging
- Add:
PR Checks / backend-smoke - Add:
PR Checks / frontend-smoke
- Add:
- ✅ Require branches to be up to date before merging
- ✅ Do not allow bypassing the above settings
These four settings guarantee: no direct pushes to main, no merging without passing tests, no merging on a stale branch. Administrators can still bypass in genuine emergencies — the "do not allow bypassing" setting controls whether even admins are gated.
Step 5: Nightly regression workflow
# .github/workflows/nightly-regression.yml
name: Nightly Regression
on:
schedule:
- cron: '0 2 * * *' # 2:00 AM UTC every night
workflow_dispatch: # manual trigger for emergency reruns
jobs:
backend-regression:
runs-on: ubuntu-latest
timeout-minutes: 25
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
defaults:
run:
working-directory: backend
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: 'maven'
- name: Run backend shard ${{ matrix.shard }} of 4
run: |
mvn test \
-DsuiteFile=testng-shard-${{ matrix.shard }}.xml \
-Dheadless=true \
-B
env:
BASE_URL: ${{ secrets.STAGING_URL }}
- uses: actions/upload-artifact@v4
if: always()
with:
name: backend-shard-${{ matrix.shard }}
path: backend/target/surefire-reports/
retention-days: 3
frontend-regression:
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3]
defaults:
run:
working-directory: frontend
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm', cache-dependency-path: frontend/package-lock.json }
- run: npm ci
- uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
- run: npx playwright install --with-deps chromium
- run: npx playwright test --shard=${{ matrix.shard }}/3 --reporter=blob
env:
BASE_URL: ${{ secrets.STAGING_URL }}
- uses: actions/upload-artifact@v4
if: always()
with:
name: blob-report-${{ matrix.shard }}
path: frontend/blob-report/
retention-days: 1
merge-frontend-reports:
runs-on: ubuntu-latest
needs: frontend-regression
if: always()
defaults:
run:
working-directory: frontend
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm', cache-dependency-path: frontend/package-lock.json }
- run: npm ci
- uses: actions/download-artifact@v4
with: { pattern: blob-report-*, path: frontend/all-blobs, merge-multiple: true }
- run: npx playwright merge-reports --reporter html ./all-blobs
- uses: actions/upload-artifact@v4
if: always()
with:
name: frontend-regression-report
path: frontend/playwright-report/
retention-days: 14
notify-failure:
runs-on: ubuntu-latest
needs: [backend-regression, frontend-regression]
if: failure()
steps:
- uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "❌ *Nightly regression failed* — ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}",
"channel": "#qa-builds"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOKSeven design choices in the nightly workflow worth noting:
workflow_dispatch alongside schedule — lets you rerun the nightly manually after fixing a failure, without waiting until 2 AM.
fail-fast: false on both matrices — a failing shard doesn't cancel the others. You see all failures, not just the first one.
3-day retention on shard artifacts — shards are intermediate; the merged report has 14-day retention. No reason to keep raw shard data long.
The merge step runs if: always() — it runs even if some shards failed. A partial merge still gives you a report for the tests that ran.
notify-failure depends on both regression jobs — Slack fires once when any regression job fails, not once per failing shard.
Step 1 of 6
Step 1: Skeleton workflow
Create pr-checks.yml with two parallel jobs — backend-smoke and frontend-smoke — with path filters and timeouts. No test commands yet. Confirm the file is valid YAML.
Where to pause and verify
After each step, confirm the expected outcome before continuing:
- After step 1: push the file to a branch and open a PR. The Actions tab should show the workflow running with two jobs.
- After step 2: check the step logs for "Cache restored from key" vs "Cache not found." Measure the setup time. If it's still 90 seconds, the cache configuration is wrong.
- After step 3: push a deliberate test failure. Confirm the test reporter posts an annotation directly on the PR. Confirm the artifact is downloadable.
- After step 4: try to merge the PR with the failing test. The merge button should be greyed out with "Required status checks have not passed."
- After step 5: trigger the nightly manually with
workflow_dispatch. Confirm 7 parallel shard jobs appear. Confirm the merge step runs and produces a downloadable HTML report. - After step 6: trigger the nightly with a deliberate failure. Confirm the Slack message arrives within 2 minutes.
The next lesson adds the quality gates, coverage reporting, Allure history, and the deploy-staging workflow.