Test Design Techniques
The formal techniques for picking which test cases to write — so your suite catches the bugs that matter without exploding into combinatorial chaos.
Equivalence Partitioning
Inputs that should behave the same way go in the same partition. Test one value from each partition — a value that fails for one usually fails for all.
Rules
- Identify every distinct valid partition.
- Identify every distinct invalid partition (different ways an input can be rejected each get their own).
- Pick one representative value per partition. One is enough; more is waste.
- A test case can cover multiple partitions if their behaviour is independent.
Worked example — Age field accepts 18–65
| Partition | Sample value | Expected | Notes |
|---|---|---|---|
| Valid: 18–65 | 30 | accepted | The happy path |
| Invalid: < 18 | 10 | rejected (under-age) | Too young |
| Invalid: > 65 | 80 | rejected (over-age) | Too old |
| Invalid: non-numeric | "abc" | rejected (type error) | Different rejection branch |
| Invalid: empty / null | "" / null | rejected (required field) | Different rejection branch |
| Invalid: float | 25.5 | rejected (type error) | If field requires int |
| Invalid: negative | -5 | rejected (under-age + sign) | Often the same branch as < 18, but worth confirming |
That's seven test cases instead of every possible value 0–999. Each one represents a class of inputs the system should handle the same way.
When to apply it
| Surface | Partitions usually look like |
|---|---|
| Form fields | valid / too short / too long / wrong type / empty |
| API parameters | valid range / out of range / wrong type / missing / null |
| Dropdowns | each enum value + missing/invalid option |
| File uploads | accepted type / wrong type / oversize / corrupt / empty |
| Date inputs | valid range / before min / after max / wrong format / non-existent date |
Common mistakes
- Treating all "invalid" inputs as one partition. A user who types letters into a number field hits a different code path than a user who submits an empty form. They're separate partitions.
- Picking edge values as the "representative". Boundary values are a separate technique — combine them deliberately, don't conflate.
- Skipping the empty/null partition. Most production bugs live there.
Boundary Value Analysis
Errors cluster at the edges of equivalence partitions. Off-by-one, < vs <=, inclusive vs exclusive — these are the most common bugs in any range check.
Two-value BVA (standard)
For each boundary, test the boundary itself and one step outside.
invalid valid invalid
────────|─────|─────|─────|────────
min-1 min max max+1
Three-value BVA (robust)
Adds one step inside the boundary too.
invalid valid valid invalid
───|────|─────|─────...─────|─────|────|───
min-1 min min+1 max-1 max max+1
Worth the extra two cases when off-by-one bugs would be expensive (auth, billing, safety-critical fields).
Worked example — Password field accepts 8–20 characters
| Test | Length | Expected | Notes |
|---|---|---|---|
| Below min | 7 chars | invalid | one short |
| At min | 8 chars | valid | exact lower edge |
| Just above min | 9 chars | valid | inside |
| Just below max | 19 chars | valid | inside |
| At max | 20 chars | valid | exact upper edge |
| Above max | 21 chars | invalid | one over |
Six test cases, all targeting code that does a length check. If < should have been <=, exactly one of these will catch it.
Combining with equivalence partitioning
For each partition, take a typical mid-value (EP) plus the values at and around the boundaries (BVA). For the password example:
| Partition | Sample value | Why |
|---|---|---|
| Empty | "" | type/required failure |
| Below boundary | "abc" (3 chars) | well inside the invalid partition |
| At boundary | "12345678" (8 chars) | min — the most error-prone case |
| Mid-valid | "abcdef1!" (8–20 typical) | confirms typical case |
| At boundary | "12345678901234567890" (20 chars) | max |
| Above boundary | 21 chars | one over |
Apply it to anything with a range
| Range | Boundaries |
|---|---|
Numeric (min..max) | min-1, min, min+1, max-1, max, max+1 |
String length (minLen..maxLen) | same, in characters |
| Date range | day before, first day, day after; same for end of range |
| File size | 0 bytes, 1 byte, just below limit, exactly at limit, 1 byte over |
| Pagination | page=0, page=1, page=lastPage, page=lastPage+1, very large page |
| Decimal precision | values just over the rounding edge in both directions |
Decision Table Testing
When an outcome depends on multiple conditions combining, list every combination and the action it produces. Decision tables guarantee you don't forget a rule.
Components
- Conditions — input flags (rows, top half).
- Actions — outcomes (rows, bottom half).
- Rules — columns. Each column is one combination of condition values.
Process
- List every condition.
- List every possible action.
- Build a column for every combination of condition values (
2^nfor n boolean conditions). - Fill in the action(s) for each rule.
- Reduce: collapse columns where some conditions don't matter (mark them
—); drop impossible combinations.
Worked example — Shipping cost calculator
Conditions: order over £50, premium member.
Full table (4 rules from 2 booleans):
| Condition | R1 | R2 | R3 | R4 |
|---|---|---|---|---|
| Order > £50 | Y | Y | N | N |
| Premium member | Y | N | Y | N |
| Action | ||||
| Free shipping | ✓ | ✓ | ||
| Standard rate (£3) | ✓ | |||
| Full rate (£8) | ✓ |
Collapsed — premium membership alone unlocks free shipping, so when "Premium member = Y" we don't care about order size:
| Condition | R1 | R2 | R3 |
|---|---|---|---|
| Order > £50 | — | Y | N |
| Premium member | Y | N | N |
| Action | |||
| Free shipping | ✓ | ✓ | |
| Standard rate | ✓ |
That's three test cases instead of four — but only because we proved R1 and R3 (in the original) produce the same action.
A larger worked example — Login authorisation
Conditions: valid email format, password matches, account active, account locked, 2FA enabled.
That's 2^5 = 32 combinations, but most collapse out:
| Condition | Valid login | Wrong password | Locked | Inactive | 2FA needed | Bad email |
|---|---|---|---|---|---|---|
| Valid email format | Y | Y | Y | Y | Y | N |
| Password matches | Y | N | — | — | Y | — |
| Account active | Y | — | — | N | Y | — |
| Account locked | N | — | Y | — | N | — |
| 2FA enabled | N | — | — | — | Y | — |
| Action | ||||||
| Grant access | ✓ | |||||
| Show password error | ✓ | |||||
| Show locked banner | ✓ | |||||
| Show inactive notice | ✓ | |||||
| Prompt for 2FA code | ✓ | |||||
| Show format error | ✓ |
Six rules → six test cases that cover every meaningful combination.
When to use decision tables
- Business rules with multiple conditions (pricing, eligibility, approval flows).
- Complex form validation with conditional fields.
- Authorisation matrices (role × resource × action).
- Insurance / loan / tax calculations.
- Any spec that uses the words "if … and … but only when …".
State Transition Testing
For features with discrete states, model the state machine and pick tests systematically — at the right depth.
Components
- States — distinct conditions the system can be in.
- Events — inputs that may cause a transition.
- Transitions —
(state, event) → next state. - Guards — conditions that gate a transition (
if balance > 0). - Actions — side effects performed during a transition (send email, debit balance).
State transition table
Logged Out ──login (valid)──→ Logged In
Logged Out ──login (bad)────→ Logged Out (action: show error)
Logged In ──logout─────────→ Logged Out
Logged In ──timeout────────→ Logged Out (action: redirect)
| From | Event | Guard | Action | To |
|---|---|---|---|---|
| Logged Out | Login | valid credentials | show dashboard | Logged In |
| Logged Out | Login | invalid credentials | show error | Logged Out |
| Logged In | Logout | — | clear session | Logged Out |
| Logged In | Timeout | session > 30 min | redirect | Logged Out |
Coverage levels
| Level | What it covers | When to use |
|---|---|---|
| 0-switch | Visit every state at least once | Smoke testing |
| 1-switch | Cover every valid transition (every row of the table) | Default for most features |
| 2-switch | Cover every pair of consecutive transitions | Critical workflows; chains where intermediate state matters |
| All-paths | Cover every full path from start to end | Short, finite state machines (wizards, finite workflows) |
Worked example — Order status
Draft ──submit──→ Submitted ──approve──→ Approved ──ship──→ Shipped ──deliver──→ Delivered
│ │ │
└─reject→ Rejected └─cancel→ Cancelled└─return→ Returned
1-switch coverage — every valid transition:
| # | From | Event | To |
|---|---|---|---|
| 1 | Draft | submit | Submitted |
| 2 | Submitted | approve | Approved |
| 3 | Submitted | reject | Rejected |
| 4 | Approved | ship | Shipped |
| 5 | Approved | cancel | Cancelled |
| 6 | Shipped | deliver | Delivered |
| 7 | Shipped | return | Returned |
Negative testing — invalid transitions
Equally important: the system rejects transitions that aren't on the diagram. For each state, attempt every event that shouldn't work.
| From | Event | Expected |
|---|---|---|
| Delivered | submit | rejected — terminal state |
| Cancelled | ship | rejected — already cancelled |
| Draft | ship | rejected — must approve first |
| Rejected | approve | rejected — already rejected |
Negative transitions catch state-machine bugs that valid-only testing misses (e.g. a webhook racing the state and re-shipping a cancelled order).
Edge conditions worth probing
- Mid-transition crash — kill the process between "auth captured" and "order updated". What state is the order in?
- Concurrent transitions — two admins click "Ship" within 100ms. One wins? Both succeed and double-ship?
- Replayed event — payment webhook delivered twice. Does the order go to
Paidonce? - Timer-driven transitions — abandoned cart, expired session. Does the timer fire when the user is idle? When they're active in another tab?
Pairwise / Combinatorial Testing
Most defects are triggered by the interaction of two parameters, rarely three or more. Pairwise testing covers all pairs of parameter values without testing every combination.
Why it works
Empirical studies (Kuhn et al., NIST) found ≈ 70 % of failures come from a single faulty input or a pair of inputs. By covering every pair, you catch the vast majority of bugs at a fraction of the cost.
Combinatorial explosion
3 parameters × 3 values each = 27 full combinations. 4 × 4 × 4 × 4 = 256 full. 6 × 6 × 6 × 6 × 6 = 7,776 full.
Pairwise replaces these with ~10 / ~16 / ~36. Most projects can't afford full combinatorial; pairwise is what you can actually run.
Worked example — Browser compatibility
Parameters:
- Browser: Chrome, Firefox, Safari
- OS: Windows, macOS, Linux
- Language: EN, FR, ES
Full combinatorial = 27. Pairwise — 9 tests cover every (browser × OS), (browser × lang), and (OS × lang) pair:
| # | Browser | OS | Language |
|---|---|---|---|
| 1 | Chrome | Windows | EN |
| 2 | Chrome | macOS | FR |
| 3 | Chrome | Linux | ES |
| 4 | Firefox | Windows | FR |
| 5 | Firefox | macOS | ES |
| 6 | Firefox | Linux | EN |
| 7 | Safari | Windows | ES |
| 8 | Safari | macOS | EN |
| 9 | Safari | Linux | FR |
Every browser appears with every OS at least once. Every browser appears with every language at least once. Every OS appears with every language at least once. All pairs covered, 9 tests instead of 27.
Tools that generate pairwise sets
| Tool | How |
|---|---|
| Microsoft PICT | CLI, txt config — pict params.txt outputs the test set |
| AllPairs (Python) | pip install allpairspy — programmatic generator |
| pairwise.org | Web UI for one-off generation |
| Hexawise | Commercial, supports constraints and seeding |
| PICTMaster | Excel-based generator |
PICT input file:
Browser: Chrome, Firefox, Safari
OS: Windows, macOS, Linux
Language: EN, FR, ES
pict params.txtConstraints
Real systems have impossible combinations — Safari on Linux doesn't ship. Tools support constraints:
IF [Browser] = "Safari" THEN [OS] <> "Linux";
The generator skips infeasible combinations while still covering all valid pairs.
When pairwise fits
- Configuration testing — browsers × OSes × screen sizes × locales.
- Form fields with many independent dropdowns / toggles.
- API parameters — many optional query params with several valid values.
- Feature flags matrix — handful of flags, each on/off.
- Compatibility — versions of dependencies, plugins, integrations.
When not to use it
- Two parameters strongly interact in known ways → enumerate explicitly.
- The state machine has dependencies between values → use state transition testing.
- The combinations encode business rules → use a decision table.
Error Guessing & Experience-Based Testing
Formal techniques cover the specifiable test cases. Error guessing fills the gap with judgment — the test cases that come from "I bet this is going to break."
Common error categories
- Empty / null / undefined — most common production bug source.
- Whitespace — leading/trailing, all-whitespace, tab characters in name fields.
- Special characters — quotes, angle brackets, semicolons, emoji, RTL text, zero-width spaces.
- Numeric edges —
0,-1,INT_MAX,INT_MAX + 1,NaN,Infinity,0.1 + 0.2. - Boundary timing — DST transitions, leap years, leap seconds, month/year rollover, midnight UTC vs midnight local.
- Concurrency — two clicks within 100ms, page refresh during a long action, concurrent edits.
- Network — offline submit, slow 3G, request abort, mid-upload disconnect, DNS failure.
- Auth edges — expired token mid-action, revoked permission while session is active, role downgrade.
- Storage limits — quota exceeded, IndexedDB unavailable, cookie disabled, browser private mode.
- Internationalisation — multi-byte characters in strings, language affecting numeric format (
1,234.56vs1.234,56), RTL text.
Maintain a personal checklist
Every team grows a "what bites us most" list. Capture yours:
□ Empty + whitespace inputs
□ Leading/trailing whitespace stripped where it shouldn't be
□ Pagination off-by-one (page 0 vs page 1)
□ Timezone mismatch between client and server
□ Stale browser cache after deploy
□ Optimistic UI inconsistent with server state on failure
□ Email verification race after change-of-email
□ Soft-deleted user re-registering
The list grows over years and pays for itself every release.
Combining with formal techniques
Run formal techniques first (EP, BVA, decision tables, state transitions, pairwise) — they give you systematic coverage. Then run error guessing on top — it catches what spec-driven design can't see.
Use Case Testing
Derive tests directly from how a real user accomplishes a goal. Each use case yields one main success scenario plus alternative and exception flows.
Structure of a use case
Title: Place an order
Actor: Authenticated customer
Preconditions: Cart contains at least one in-stock item; payment method on file
Main success scenario:
1. Customer reviews cart
2. Customer proceeds to checkout
3. System validates stock and pricing
4. Customer confirms shipping address
5. System charges payment method
6. System creates order and sends confirmation
7. System shows confirmation page
Alternative flows:
4a. Customer applies a coupon
→ System recalculates total, returns to step 5
4b. Customer changes shipping method
→ System recalculates shipping, returns to step 5
6a. Customer requests gift wrapping
→ System adds line item, returns to step 5
Exception flows:
3a. Item out of stock
→ System shows alert, removes item, customer continues with rest
5a. Payment declined
→ System shows error, customer enters new method, returns to step 5
5b. Network error mid-payment
→ System retries; on second failure, shows recovery instructions
→ Customer's cart is preserved
*. Session timeout at any step
→ System asks customer to re-authenticate, returns to current step
Coverage
| Layer | Tests |
|---|---|
| Main success scenario | 1 (the happy path) |
| Each alternative flow | 1 each (4a, 4b, 6a → 3 cases) |
| Each exception flow | 1 each (3a, 5a, 5b, * → 4 cases) |
| Cross-cutting variations | logged-out user; mobile vs desktop; saved card vs new card |
The main flow plus all alternatives and exceptions is usually 5–15 tests per use case. Worth tracking against the use-case document for traceability — every alternative and exception flow should map to at least one test case.
When to favour use case testing
- Workflows where the order of steps matters (checkout, onboarding, KYC, multi-step forms).
- Scenarios driven by persona / role — admin onboarding flow vs end-user.
- Acceptance testing — UAT scripts written from use cases read naturally.
State transitions, decision tables, and pairwise complement use cases — once you've identified the flows, those techniques tell you which inputs to drive at each step.