ConceptsIntermediate6-8 min reference

Exploratory Testing Heuristics

Heuristics, tours, and prompts for finding bugs that scripted tests miss. Pull one off the shelf when you're staring at a feature and don't know where to start.

SFDPOT — San Francisco Depot

A Bach-and-Bolton coverage heuristic. For each dimension, ask the listed questions and you'll surface test ideas you would have missed.

Structure — what is it made of?

Ask	Examples
What components make up this feature?	Front-end form, API, validation rules, DB tables, queue, worker
What files / modules does it touch?	`auth.ts`, `User.java`, migration `0042_add_phone.sql`
What database tables / collections?	`users`, `sessions`, `audit_log`
What configuration files?	`app.yml`, feature flags, environment variables
What third-party libraries?	Stripe SDK, Auth0, AWS S3

Function — what does it do?

Ask	Examples
What can a user accomplish?	Create account, reset password, upload avatar
What can the system do automatically?	Send digest email, expire stale sessions, archive old data
What functions exist for admins / power users?	Impersonate, export, bulk-delete
What's missing that should be here?	"Forgot password" link, undo, confirmation dialog

Data — what does it process?

Ask	Examples
What inputs flow in?	Form fields, query params, headers, file uploads
What outputs flow out?	API responses, emails, downloads, audit log entries
What is stored, where, and for how long?	Session token in cookie 30d, password hash in DB, image in S3
What states can the data be in?	Draft, submitted, approved, rejected, archived
What types and ranges?	Strings, ints, dates, JSON, files

Platform — what does it depend on?

Ask	Examples
Which browsers / OSes / devices?	Chrome, Safari iOS, Edge, screen readers
Which downstream APIs?	Payments, identity, search, email
Which infrastructure?	Postgres, Redis, S3, queue, CDN
Network conditions?	Offline, slow 3G, high-latency, IPv6, captive portal

Operations — how is it used in practice?

Ask	Examples
Install / first-run experience?	Onboarding wizard, default settings, migration from old version
Updates and rollback?	Zero-downtime deploys, schema migrations, feature flag rollback
Backups and recovery?	Point-in-time restore, export/import, audit log retention
Monitoring and alerting?	Error rates, latency SLOs, on-call playbook

Time — how does it behave over time?

Ask	Examples
Timeouts and waits?	Session expiry, request timeout, retry-after, cron schedule
Aging data?	Stale tokens, expired coupons, year-end rollovers, leap years
Concurrency?	Two users editing the same record, race in checkout, double-click
Order of operations?	Out-of-order webhooks, replayed events, late deliveries

RCRCRC — Regression Testing

Six lenses for choosing what to retest after a change. Useful when you can't run everything.

Letter	Focus	What to look at
Recent	Recently changed	Areas touched in the last sprint or PR
Core	Critical functions	Login, checkout, billing, search — the things users do every day
Risky	High-complexity / past-bug heavy	Multi-step flows, integrations, payments, race conditions
Configuration	Settings + environment-specific	Different roles, locales, plans, OS, browser, device
Repaired	Recently fixed	Verify the fix and check for fresh regressions around it
Chronic	Persistently flaky / buggy	Areas that bite every release no matter what

FEW HICCUPPS — Consistency Heuristics

Bach and Kaner's consistency oracles. The product should be consistent with each of these — call out any divergence as a bug, observation, or question.

Letter	Consistency with…	Probe
Familiar	Similar products	Does it follow conventions a user already knows?
Explainability	A clear explanation	Could you explain this behaviour to a customer with a straight face?
World	Reality	Does it model real-world physics, money, dates, geography correctly?
History	Past versions	Has long-standing behaviour silently changed?
Image	Brand and reputation	Does this match the polish customers expect from us?
Comparable	Comparable products	How does the closest competitor handle this case?
Claims	Marketing and docs	Do the screenshots, docs, and copy still match the product?
User expectations	What users expect	Is the surprise factor low for the typical user?
Purpose	Stated purpose	Does this support the feature's reason for existing?
Product	Itself	Are similar things in the app handled the same way internally?
Standards	Applicable standards	RFCs, W3C, WCAG, ISO — does it comply where required?

Touring Heuristics (James Whittaker)

Run a tour when you need a focused mode of exploration. Each tour has a different goal and yields different bugs.

Guidebook Tour — Walk through the user-facing documentation step by step. Verify every example works as written; flag wording that's misleading or stale.
Money Tour — Test the features the company makes money on (checkout, subscription upgrade, paid-only flows). Bugs here have the highest blast radius.
Landmark Tour — List the 5–10 features users care about most, then visit each. Useful as a smoke-test charter.
Intellectual Tour — Find the hardest features to test (state machines, concurrent editing, billing). Spend a session on the one you're most afraid of.
FedEx Tour — Follow data through the system end to end: input → validation → processing → storage → output → notification. Look for places it can be lost or transformed wrong.
Garbage Collector Tour — Feed the system input it doesn't want: invalid types, oversize files, malformed JSON, unexpected encodings. Check how cleanly errors propagate.
Bad Neighborhood Tour — Spend the session in areas where bugs cluster historically. Look at recent bug reports and test their neighbourhood.
Museum Tour — Test legacy features no one has touched in years. They often quietly broke during refactors.
Back Alley Tour — Explore the least-used features (admin tools, edge-case settings, obscure shortcuts). Expect more bugs per minute than in the main flows.
All-Nighter Tour — Leave the app running overnight. Memory leaks, expired sessions, day-rollover bugs, scheduled-job failures often only show up after hours.
Supermodel Tour — Test only the UI: layout, alignment, colour, contrast, hover/focus states. Don't click anything that submits.
Couch Potato Tour — Do the bare minimum. Accept defaults. Click "OK" without reading. Surfaces what happens to users who don't pay attention.

Session-Based Test Management (SBTM)

A way to make exploratory testing accountable without scripting it.

The structure

Concept	Definition
Session	An uninterrupted block of testing time, usually 60–90 minutes
Charter	A single mission for the session, written before it starts
Debrief	A short conversation after the session — surface findings and refine the next charter

Example charters

Explore the login flow with invalid credentials across Chrome, Safari, and Firefox.
Probe the new bulk-import feature with the Garbage Collector tour for 60 minutes.
Verify the password-reset email survives quoted-printable encoding in Outlook.
Run the Money Tour on the upgrade flow with a previously-cancelled subscription.

Session sheet template

Charter:        Explore the bulk-import feature using the Garbage Collector tour
Tester:         Vimal
Date / time:    2026-05-03 / 10:00–11:00
Duration:       60 minutes (45 testing, 15 setup/notes)

Areas covered:  CSV upload UI, /api/imports, validation messages, error reporting

Bugs found:
  #1234  Empty CSV is accepted, creates 0-row import (silent)
  #1235  CSV with BOM at start strips first column header
  #1236  100 MB file kills the upload progress bar (frozen at 0%)

Issues / questions:
  - Spec doesn't say whether duplicate rows should error or merge
  - 415 vs 422 inconsistency between front-end and API for "wrong file type"

Notes:
  - Drag-and-drop works on Chrome but not Safari iOS
  - Progress bar resets on tab switch — confirm if intentional

Useful metrics (don't overdo them)

Sessions completed vs planned
Charter completion % — did you finish the mission, or get distracted?
Bugs / session — track trend, not absolute (rises with familiarity, then falls)
Coverage map — which areas have been visited recently, which are stale

Input Variation Techniques

What to throw at every input field, every API parameter, every config value.

Boundary values

For any numeric or length-bounded input, test six points:

min - 1   ← invalid
min       ← valid (lower edge)
min + 1   ← valid (just inside)
max - 1   ← valid (just inside)
max       ← valid (upper edge)
max + 1   ← invalid

Equivalence partitions

Pick one representative from each class instead of testing every value.

Field	Valid class	Invalid classes
Age	0–120	-1, 121, "abc", 1.5
Email	well-formed	missing `@`, missing TLD, double `@`, leading space
Phone (US)	10 digits	9 digits, 11 digits, letters, formatting only

Special characters

Always have a test that includes these:

<script>alert(1)</script>      HTML / XSS
'  "  --                       SQL quote breakers
' OR 1=1 --                    classic SQLi
${jndi:ldap://x}               Log4Shell-style
\x00 \r \n \t                  control characters
% _ ? *                        SQL / shell wildcards
🚀 漢字 ñ ø                      Unicode + emoji
${var}  {{var}}                template-injection markers
../../etc/passwd               path traversal

Empty / blank / null

Empty string ""
Whitespace only " "
Null / undefined
Field omitted from the request entirely
Field present but with value null

Length

0 chars
1 char
255 chars                      legacy varchar boundary
256 chars
1000+ chars                    paragraph / paste from doc
1 MB                           paste from a log file

Numbers

0
-0                             different from 0 in some systems
-1
1
INT_MAX (2147483647), INT_MAX + 1
INT_MIN, INT_MIN - 1
3.14159, 1e308, 1e-308
NaN, Infinity, -Infinity

Dates and times

1970-01-01T00:00:00Z           Unix epoch
1969-12-31T23:59:59Z           pre-epoch (often breaks)
2000-02-29                     leap day
2024-02-29 vs 2025-02-29       leap year vs non-leap
2038-01-19T03:14:07Z           32-bit Unix overflow
DST forward / backward day     timezone math
Dec 31 → Jan 1                 year rollover
Feb 28 → Mar 1                 month rollover
local time in non-UTC zone     server vs client time mismatch

File uploads

Probe	What it tests
Wrong extension (`.exe` renamed to `.png`)	MIME-type validation vs extension check
Empty file (0 bytes)	"did the upload succeed?" branch
Huge file (1 GB+)	Server limits, progress, timeout, memory
No extension	Default handling
Double extension (`avatar.png.exe`)	Sanitisation
Path-traversal name (`../../evil.png`)	Filename sanitisation
Same filename twice	Conflict / overwrite policy
File during network drop	Resume / retry

State Transition Testing

For any feature with discrete states (orders, accounts, subscriptions, document workflow), explicitly test the transitions.

A four-step recipe

Enumerate every state the system can be in.
Map every transition the spec says is valid.
Test each valid transition with the action that triggers it.
Attempt every invalid transition — expect a clean refusal, not a crash.

Example — order state machine

PLACED → PAID → SHIPPED → DELIVERED
   ↓       ↓        ↓
CANCELLED  REFUNDED  RETURNED

From	Action	Expected new state
PLACED	Pay	PAID
PLACED	Cancel	CANCELLED
PAID	Ship	SHIPPED
PAID	Refund	REFUNDED
SHIPPED	Cancel	rejected — order already shipped
DELIVERED	Pay again	rejected — already paid
CANCELLED	Pay	rejected — terminal

Conditions worth probing

Interruption mid-transition — kill the network at the moment payment auth completes but before the order updates. What state is the order in?
Concurrent transitions — two admins click "Ship" within 100ms. Both succeed? One wins? Idempotent?
Replayed event — payment webhook arrives twice. Does the order go to PAID once or charge twice?
State after timeout — user abandons the checkout for an hour. Cart cleared, reserved stock released, session expired?

Bug Reporting During Exploration

You will lose details to memory faster than you think. Capture as you go.

Capture-as-you-go habits

Screen-record everything — keep a continuous recording during the session and clip the relevant 30 seconds when you spot something.
Console + network tab open at all times. Save the HAR if a bug looks API-related.
Write notes in a single scratch file with timestamps. Real notes beat reconstructed ones.
Keep a "questions" list separate from the bug list. Not every weirdness is a bug — some are spec gaps.

Before you file

Can you reproduce it? Try at least twice.
What is the shortest path? (Strip the steps that don't matter.)
Does it reproduce in another browser / role / environment?
Did the same thing exist before the recent change? (Avoid blaming the wrong PR.)
What's the right severity? (User-facing data loss = P0; cosmetic glitch on rarely-used screen = P3.)

Quick classification

Type	Smell
Functional	Wrong result, missing behaviour, broken flow
UI	Layout, alignment, contrast, focus state, copy
Performance	Slow response, memory growth, CPU spike, pagination degradation
Edge case	Specific input or state combination
Integration	Boundary between two systems (frontend/backend, app/payment, app/email)
Security	Auth bypass, injection, IDOR, data leakage, missing rate limit
Accessibility	Keyboard trap, missing label, contrast, screen-reader gap
Question	Spec is ambiguous — file as a question, not a bug, until product confirms