Definition of Done From a Testing Perspective

7 min read

The Definition of Done (DoD) is the team's shared checklist of conditions that must be true before any story can be called complete. It is the single most important quality gate in agile, because it converts "is this story done?" from a judgement call into a check. A team without an explicit DoD relies on each person's private definition of "done" — and those definitions never align. This lesson shows what belongs in a useful DoD, why testers should have a strong voice in defining it, and how to evolve it without bloating it.

Why testers should help write the DoD

Most of the items that should appear in a DoD are testing items: tests passed, regression checked, no critical bugs open, accessibility verified, cross-browser smoke complete. If the tester does not contribute, the DoD becomes a list of developer concerns ("code reviewed, unit tests written") with quality as an afterthought. A DoD that includes "all acceptance criteria verified" but nothing about residual bugs, regression, or cross-browser is not really a quality gate — it is a code-completion gate dressed up.

A useful rule: any condition you find yourself wishing the team had checked before declaring stories done is exactly the kind of item that belongs in the DoD.

Definition of Done vs acceptance criteria

These are easy to confuse. A short distinction:

  • Acceptance criteria are story-specific. They describe what this particular story must do — "given a logged-in user, when they apply code SPRING10, then the order total decreases by 10%." Different for every story.
  • Definition of Done is team-wide. It applies to every story the team ever ships — "no open P1/P2 bugs, regression on impacted areas green, accessibility checks passed."

A story is "done" when it satisfies both its specific acceptance criteria and the team's blanket DoD. Skipping either leaves a coverage gap.

What belongs in a useful DoD

A DoD with five sharp items is much better than one with twenty vague ones. The items should be checkable — two reasonable people should agree on whether each item is met.

Common testing-related items that earn their place:

  • All acceptance criteria verified — the story-specific scope has been tested and passes.
  • Functional testing complete on the changed area, with results recorded.
  • No open P1/P2 bugs related to this story.
  • Regression smoke run on impacted adjacent areas.
  • Cross-browser smoke on the browsers in the team's policy.
  • Accessibility basics — keyboard, contrast, labels, headings — passed.
  • Test cases updated in the test management tool.
  • Demo path verified on the demo environment.

Engineering items typically include code review approved, CI green, documentation updated, no new lint warnings, and merged to the integration branch.

A common smell: items like "tested." That is not checkable. "Functional, regression, and cross-browser tests all passed with results attached" is.

Try it: classify the candidates

Below is a list of candidate DoD items for a mid-size web product. Some are essential at any maturity level; some are valuable enhancements; some are common but low-value. Classify each one based on what would actually earn its keep on a working DoD.

Building a working Definition of Done

Pick the right tier for each candidate item.

  • All acceptance criteria verified
  • No open P1/P2 bugs related to this story
  • Code review approved by a peer
  • Regression smoke on impacted adjacent flows
  • Cross-browser smoke on the team's supported browsers
  • Accessibility basics — keyboard, labels, contrast — passed
  • Story has at least 12 detailed test cases written down
  • Story manually tested by 3 different testers

 

The two "low value" items are common DoD bloat: they sound rigorous but rarely improve outcomes. Twelve test cases is an arbitrary count; three testers manually verifying every story is overkill outside of safety-critical domains. Both add cost without proportional safety. Trim them, and the DoD that remains is short, sharp, and actually used.

Evolving the DoD over time

A useful DoD grows with the team's maturity:

  • At launch (5 items): acceptance criteria verified, code reviewed, CI green, no open P1/P2, smoke tests pass.
  • After three months (8 items): add regression on impacted areas, cross-browser smoke, accessibility basics.
  • After six months (12 items): add explicit performance benchmarks for changed flows, demo path verified, test cases updated, documentation reviewed.

Items earn their place by catching real problems. Each retro that surfaces a recurring miss should produce a DoD update — "we shipped a Safari-broken release three sprints in a row, so cross-browser smoke is now part of DoD." Items also earn the right to be removed when they stop adding value or get covered automatically by tooling. A DoD that only grows is a DoD that bloats.

Vague DoD vs sharp DoD

The single best test of a DoD: read each item out and ask "could two reasonable people disagree on whether this is met?" If yes, the item is too vague. ❌ "Testing done" is useless — testing covers infinite things. ✅ "Functional tests passed; no open P1/P2 bugs; regression smoke on login and checkout green" is checkable in two minutes.

⚠️ Common mistakes

  • Letting the DoD become a 30-item wishlist. Items get added in retros and never removed. After a year the DoD is so heavy nobody checks it. Cap the list and prune anything that has not caught a real issue in the last quarter.
  • Treating the DoD as the developer's responsibility alone. A good DoD has roughly equal contributions from dev and test concerns. Without tester input, quality items are missing or vague.
  • Confusing acceptance criteria with the DoD. They are complementary. Story-specific ACs define what this story must do; the DoD defines the quality bar every story must clear.

🎯 Practice task

Spend 25 minutes auditing or drafting your team's Definition of Done.

  1. Find your team's current DoD (or write one from scratch if none exists). List all the items.
  2. For each item, write pass / vague / low-value: pass if it is sharp and earning its place, vague if two reasonable people could disagree on whether it is met, low-value if it has not caught a real issue in months.
  3. Rewrite the vague items so they are checkable. "Tested" becomes "functional and regression smoke passed; no P1/P2 open." "Reviewed" becomes "code review approved by a peer."
  4. Identify one missing item that would have caught a recent regression — and draft the wording of the new DoD entry that would have prevented it.
  5. Bring the proposed changes to your next retro. A specific, evidence-backed DoD change is one of the highest-leverage things a tester can drive in agile, and it permanently raises the quality floor of every future sprint.

The next chapter shifts from process to artefacts — how to write the test cases, bug reports, and summary documents that make all of this collaboration durable.

// tip to track lessons you complete and pick up where you left off across devices.