ConceptsBeginner7-9 min reference

Agile Testing

How testing actually works inside an Agile team — what QA does in each ceremony, how to size effort per story, what "done" means, and how the practice extends from CI into production.

Agile Testing Principles

Principle	What it looks like in practice
Testing is continuous, not a phase	Tests run every commit; QA pairs with devs all sprint, not at the end
Quality is the whole team's responsibility	Devs write unit + integration tests; product owns acceptance criteria; QA orchestrates and explores
Fast feedback over comprehensive documentation	A bug raised in standup beats a 4-page report a week later
Working software over extensive test plans	Run the feature, even half-built, instead of waiting for a "complete" spec to plan against
Respond to change over following the plan	Re-prioritise tests when scope shifts mid-sprint; the plan serves the work, not the other way around
Prevention over detection	Catch issues at the requirements / design stage — cheaper than catching them in QA, much cheaper than in production
Shift-left — QA involved early	Review acceptance criteria, attend design reviews, review PRs, write tests before code

Agile Testing Quadrants (Brian Marick)

A model for thinking about what kind of testing you're doing and why. Two axes: business vs technology, and supporting the team vs critiquing the product.

                  Supporting the team
                          │
   ┌──────────────────────┼──────────────────────┐
   │                      │                      │
   │         Q2           │         Q3           │
   │    Functional        │   Exploratory        │
   │    User stories      │   Usability / UAT    │
   │    Prototypes        │   Alpha / beta       │
   │                      │                      │
   ├──────────────────────┼──────────────────────┤   Business-facing
   │                                             │
   │ Technology-facing                           │
   │                                             │
   ├──────────────────────┼──────────────────────┤
   │                      │                      │
   │         Q1           │         Q4           │
   │    Unit tests        │   Performance        │
   │    Component tests   │   Security / load    │
   │    Contract tests    │   Soak / chaos       │
   │                      │                      │
   └──────────────────────┼──────────────────────┘
                          │
                  Critiquing the product

Quadrant	Test types	Owner	Automation
Q1 — technology-facing, supporting	Unit, component, contract	Devs	Fully automated
Q2 — business-facing, supporting	Functional / story tests, prototypes, examples	Devs + QA	Automated where stable, manual for new behaviour
Q3 — business-facing, critiquing	Exploratory, usability, UAT, alpha/beta	QA + real users	Manual — judgment-driven
Q4 — technology-facing, critiquing	Performance, security, load, soak, chaos	Specialists + QA	Tool-driven, scheduled

A balanced team invests in all four. A common smell: heavy in Q1 + Q2, no Q3 (no exploratory) and no Q4 (no perf/security). Bugs sneak through the gap.

Test Pyramid

The cost-and-coverage shape of an Agile test suite. Most tests at the bottom (cheap, fast); fewest at the top (slow, expensive).

            ╱╲
           ╱  ╲       ~10%   E2E       slowest, most fragile
          ╱────╲
         ╱      ╲     ~20%   Integration
        ╱────────╲
       ╱          ╲   ~70%   Unit       fastest, cheapest
      ╱────────────╲

Layer	Typical share	Speed	Owner	Strengths
Unit	~70%	ms — runs on save	Devs	Logic errors in pure functions, edge cases, regressions in calculations
Integration	~20%	seconds	Devs + QA	Component interactions, DB queries, API contracts, message handling
End-to-end	~10%	tens of seconds	QA	Real user flows, deploy correctness, browser-specific behaviour

Anti-patterns

Anti-pattern	Shape	Why it fails
Ice-cream cone	Tip-heavy E2E layer over a thin base	Slow CI, brittle tests, expensive maintenance, flaky signal
Hourglass	Many unit + many E2E, almost no integration	Big behavioural gaps — pure-logic units pass, full flows pass, but the seams between modules silently break
Cupcake	Decorations on top — manual tests stacked above E2E	Manual regression on every release; release cadence drops below business needs

The pyramid isn't a law — for some products (libraries, pure-logic services) the right shape is even more bottom-heavy. For others (UI-heavy apps), 60/25/15 is more realistic. The point: be deliberate about the ratio, not accidental.

QA in Scrum Ceremonies

Ceremony	What QA brings
Backlog refinement	Review upcoming stories for testability — can we tell when this is done? Flag missing or vague acceptance criteria. Raise risks (data, performance, accessibility) before sizing
Sprint planning	Estimate testing effort per story; identify test approach (manual / automated / both); raise dependencies (test data, third-party stubs, env access); split stories that are too big to test in-sprint
Daily standup	Testing status per story; blockers (broken build, env down, awaiting fix); fresh defects worth flagging early
Sprint review / demo	Demo tested features; show quality metrics (coverage, defect counts, escaped bugs); gather stakeholder feedback that becomes next sprint's input
Sprint retrospective	Process improvements: too much regression, slow CI, flaky environment, test-data setup pain, automation gaps. The retro is where QA practice gets better — don't sit silent

Three Amigos meeting

When a story is unclear, get a developer, a tester, and a product person together — the three amigos. The tester's role is to keep asking "what could go wrong?" and "what's the acceptance criteria for that case?" until the story is concrete enough to estimate.

Story Testing Workflow

The same-sprint flow that healthy Agile teams use. The order matters: testing tasks are spread across the sprint, not stacked at the end.

Story enters sprint
       │
       ▼
QA reviews AC ──── gaps? ──→ raise in standup / Three Amigos
       │
       ▼
QA writes scenarios (shift-left, before dev finishes)
       │
       ▼
Developer builds the feature
       │
       ▼
QA tests on dev branch or feature environment
       │
       ├──── bug found? ──→ communicate immediately (chat / pair > ticket)
       │                    └─ developer fixes ─ QA verifies
       ▼
Regression check (automated suite + targeted manual)
       │
       ▼
Story → Done (DoD met) → demo at review

What gets in the way

Story arrives in code review with no test scenarios. QA wasn't pulled in early — fix at refinement, not at the PR.
All testing happens on the last day of the sprint. Story was too big to ship + test in one sprint. Split it.
"It works on my machine." No shared dev/feature env, or env is broken. Treat env health as a blocker, not a fact of life.
Bugs filed but never fixed in-sprint. Carryover compounds. Cap WIP on bugs the same way you cap stories.

Definition of Done (DoD) — Testing Criteria

A story isn't done until everything below is true. Treat this as a checklist on the story card — paste it into the description if your tracker doesn't surface it natively.

□ All acceptance criteria verified (manual or automated)
□ Unit test coverage meets team threshold (e.g. ≥ 80 %)
□ Integration tests passing
□ Regression suite passing
□ No open critical or high severity defects
□ Performance benchmarks met (if perf-sensitive)
□ Accessibility checks passed (WCAG AA)
□ Cross-browser / cross-device tested per support matrix
□ Code reviewed and approved
□ Documentation updated (user-facing, API, runbook)
□ Telemetry / logging in place

Some teams also add: feature flag added (if behind one), translations updated, analytics event wired, security review checked off.

The exact list depends on the team — but every team should have an explicit DoD. "We'll know it when we see it" is how regressions ship.

Acceptance Criteria & BDD

INVEST — what makes a good user story

Letter	Means	Tester's lens
Independent	Can be developed without depending on another story	Can it be tested in isolation?
Negotiable	Detail can shift during refinement	Are the AC firm enough to derive cases, or still TBD?
Valuable	Delivers value to a user or stakeholder	Can you state the business outcome it enables?
Estimable	Team can size the effort	Is testing effort included in the estimate?
Small	Fits in one sprint	Can I test all the AC inside the sprint?
Testable	Acceptance criteria are verifiable	Can I write a pass/fail test for each AC?

If you can't answer the testability question, the story isn't ready. Send it back to refinement.

Given / When / Then format

The standard structure for acceptance criteria in Agile + BDD teams. Each scenario reads as one observable outcome.

Clause	Purpose
`Given`	Pre-existing state — the world as it is before the action
`When`	The action — exactly one event that triggers the behaviour
`Then`	The expected outcome — what must be true after the action
`And` / `But`	Additional `Given`/`When`/`Then` clauses

Worked example

Given I am a logged-in user
And my cart is empty
When I add an item to my cart
Then the cart count should increase by 1
And I should see the item in the cart summary

Read top to bottom: the scenario is concrete, observable, and binary. The Then clauses are what the test will assert.

Multiple scenarios per story

Most stories need 3–6 scenarios — at minimum, one happy path plus the obvious failure modes.

Scenario: Add an item to an empty cart
  Given I am a logged-in user
  And my cart is empty
  When I add "Mountain Bike" to my cart
  Then the cart count should be 1
  And the cart summary should list "Mountain Bike"

Scenario: Add an out-of-stock item
  Given I am a logged-in user
  When I attempt to add an out-of-stock item to my cart
  Then I should see "Out of stock" message
  And the cart should remain empty

Scenario: Add an item while logged out
  Given I am not logged in
  When I attempt to add an item to my cart
  Then I should be redirected to the login page
  And the item should be added to the cart after I log in

Converting acceptance criteria to test cases

Each scenario in Given/When/Then form maps directly to a test case. The test runner determines the level:

AC scenario level	Where the test runs
Pure logic / domain rule	Unit test
Service interaction	Integration test
End-to-end user flow	E2E (Cypress / Playwright / Selenium)

The same Given/When/Then text can drive a manual test, a Cucumber/SpecFlow scenario, or be paraphrased into a Playwright test() block — pick the level that matches the AC's scope, not always the highest.

Automation of acceptance tests

When AC are written in Gherkin, automation is mostly glue:

Tool	Language	Native to
Cucumber	Java, JS/TS, Ruby, Python, others	Most ecosystems
SpecFlow	C# / .NET	Visual Studio
Behave	Python	pytest-adjacent
Robot Framework	Python (keyword-driven, BDD-like)	Acceptance + RPA
Karate	Java (Gherkin for API testing)	API-first BDD

The benefit isn't speed of writing — it's that the AC become the test artefact. Product, dev, and QA all see the same Given/When/Then; nobody hand-translates between a Word doc and a code file.

The cost: discipline. If step definitions become a tangled mess of generic When I click {string} steps, you've lost the readability advantage. Keep step phrasing domain-specific, not technology-specific.

Continuous Testing

The pipeline-driven extension of Agile testing — every commit verified through a layered test suite that gets slower as confidence grows.

Every commit triggers automated tests

The pipeline runs the same tests for every PR and every merge. Local "but it works on my machine" loses to CI as the source of truth.

The standard pipeline shape

commit
   │
   ▼
┌──────────┐  fail-fast — runs in seconds
│   lint   │
└────┬─────┘
     ▼
┌──────────┐  fast — isolated, no I/O
│   unit   │
└────┬─────┘
     ▼
┌──────────────┐  medium — DB, message bus, HTTP mocks
│ integration  │
└────┬─────────┘
     ▼
┌──────────┐    slow — full browser, real services
│   E2E    │
└────┬─────┘
     ▼
┌──────────────┐  optional / scheduled — load, soak
│ performance  │
└──────────────┘

Each stage gates the next. A failure in unit aborts before E2E even starts. Cheaper failures find faster feedback.

Fast-feedback budget

Stage	Target time	What this means in practice
Lint	< 30 s	Pre-commit hook catches it before CI fires at all
Unit	< 2 min	Devs trust the green light enough to keep flowing
Integration	< 5 min	Acceptable to wait on a PR
E2E	< 15 min total	Sharded across runners; each shard < 5 min
Performance	scheduled / nightly	Not blocking PRs, but visible to the team

If the unit stage takes 20 minutes, devs stop running it locally. If E2E takes 90 minutes, devs stop reading the failures. Slow tests get bypassed — speed is correctness.

Shift-right — testing extends into production

Continuous testing doesn't stop at deploy. The complement of shift-left is shift-right: learn from production.

Practice	What it is	What it catches
Synthetic monitoring	Automated probes hit production from outside (Pingdom, Datadog, Checkly, Grafana k6 cloud)	Outages, latency regressions, broken third-party integrations, cert expiry
Real-user monitoring (RUM)	Browser SDK reports real-user load times, errors, click flows	Browser-specific bugs, slow flows for real users on real networks
Canary deployments	Roll new version to 1% → 10% → 50% → 100% over hours/days	Regressions visible at low blast radius before wide rollout
Feature flags	Ship dark, enable for a small cohort, then everyone	Test in production safely; instant rollback without redeploy
Error tracking	Sentry / Rollbar / Bugsnag capture exceptions with stack + breadcrumbs	Bugs that don't reproduce locally; regressions that escape pre-prod tests
Chaos engineering	Deliberate failure injection — kill instances, drop traffic, slow networks	Resilience gaps; recovery timing assumptions

Feature flags — ship dark, then test

Decouples deploy from release. Code reaches production behind a flag; the flag stays off until tested. Switch on for QA, then internal users, then real users.

deploy (flag off, no behaviour change)
        │
        ▼
flag-on for QA-only cohort   ──────────┐
        │                              │
        ▼                              │
flag-on for 1% of real users           ├──── monitor production
        │                              │
        ▼                              │
flag-on for 100%                       │
        │                              │
        ▼                              │
remove flag from code  ◄───────────────┘

If anything goes wrong at any step: flip the flag off — no rollback, no redeploy.

A/B testing — validate with real users

Run the new version (B) against the old (A) for two cohorts of real users. Compare outcomes:

What you measure	Example
Conversion	% completing the funnel
Engagement	Time on page, click-through
Errors	Crash rate, validation failure rate
Performance	LCP, INP, time-to-interactive

The QA role isn't to pick the winner — it's to make sure the experiment is measurable (instrumentation present, metrics defined, sample size adequate) and that both arms are equally tested before launch.

Agile Testing Metrics

Metrics in Agile aren't management report fodder — they're feedback for the team. Pick a small set and watch the trend, not the absolute number.

Metric	Definition	Target	Smell when
Defect density	Defects ÷ stories (or ÷ KLOC)	Trends down sprint over sprint	Spikes — usually a story too big or AC too thin
Escaped defects	Bugs found in production that pre-prod tests missed	As close to 0 as the team can sustain	Trending up — coverage gaps; review post-mortems
Defect resolution time	Mean time from "reported" → "fixed and verified"	< 2 days for high-severity	Bugs piling up — WIP-cap them
Reopened defect rate	% of defects re-opened after marked "fixed"	< 5%	Fix verifications too shallow; missing regression coverage

Coverage metrics

Metric	Definition	What it actually tells you
Acceptance test coverage	% of acceptance criteria with at least one automated test	Confidence the AC won't regress silently
Code coverage	% of source lines / branches executed by tests	Useful when trending; useless as an absolute target — 100% covered code can still be untested logic
Requirements coverage	% of user stories with at least one test case	Higher level than code coverage — better signal for product completeness

Code coverage as a target gameable; as a trend, it's a sensible early warning.

Velocity & process metrics

Metric	Definition	Tester's read
Velocity impact	How testing effort affects team velocity per sprint	If velocity drops every time a sprint includes UI testing, the test debt is real
Sprint burndown — testing tasks	Testing work as part of the sprint burndown chart	Testing should burn down alongside dev — not stack at the end
Stories rolled over	Stories that couldn't be marked "Done" because testing wasn't complete	Persistent rollover means testing capacity is short of dev capacity
Cycle time	Time from "in progress" → "done" per story	Includes testing — long cycle times often mean late testing

Automation metrics

Metric	Definition	Healthy range
Automation ratio	Automated tests ÷ total tests	Trending up; the absolute % depends on the product
Automation coverage of regression suite	% of regression test cases automated	High — manual regression is the slowest path to release
Test execution time	Wall-clock time of the full automated suite	Stable or shrinking; growth past the "fast feedback budget" needs sharding or pruning
Flakiness rate	% of automated tests that fail on retry without code change	< 1% per test, < 5% suite-wide. Above that, devs stop trusting CI
Test maintenance ratio	Time spent fixing tests ÷ time writing new tests	If fixes dominate, the suite is over-coupled to UI internals — refactor

Pick the smallest set that drives action

Reporting 12 metrics nobody acts on is a bigger problem than reporting 3 you do. A practical starter dashboard:

Escaped defects this release — the only one product cares about.
CI build time — fast-feedback budget; team productivity.
Flakiness rate — trust in the suite; if it climbs, fix it that sprint.
Stories rolled over due to testing — capacity signal.

Add more only when you have a question those four don't answer.