On this page8 sections
ConceptsIntermediate6-8 min reference

Exploratory Testing Heuristics

Heuristics, tours, and prompts for finding bugs that scripted tests miss. Pull one off the shelf when you're staring at a feature and don't know where to start.

SFDPOT — San Francisco Depot

A Bach-and-Bolton coverage heuristic. For each dimension, ask the listed questions and you'll surface test ideas you would have missed.

Structure — what is it made of?

AskExamples
What components make up this feature?Front-end form, API, validation rules, DB tables, queue, worker
What files / modules does it touch?auth.ts, User.java, migration 0042_add_phone.sql
What database tables / collections?users, sessions, audit_log
What configuration files?app.yml, feature flags, environment variables
What third-party libraries?Stripe SDK, Auth0, AWS S3

Function — what does it do?

AskExamples
What can a user accomplish?Create account, reset password, upload avatar
What can the system do automatically?Send digest email, expire stale sessions, archive old data
What functions exist for admins / power users?Impersonate, export, bulk-delete
What's missing that should be here?"Forgot password" link, undo, confirmation dialog

Data — what does it process?

AskExamples
What inputs flow in?Form fields, query params, headers, file uploads
What outputs flow out?API responses, emails, downloads, audit log entries
What is stored, where, and for how long?Session token in cookie 30d, password hash in DB, image in S3
What states can the data be in?Draft, submitted, approved, rejected, archived
What types and ranges?Strings, ints, dates, JSON, files

Platform — what does it depend on?

AskExamples
Which browsers / OSes / devices?Chrome, Safari iOS, Edge, screen readers
Which downstream APIs?Payments, identity, search, email
Which infrastructure?Postgres, Redis, S3, queue, CDN
Network conditions?Offline, slow 3G, high-latency, IPv6, captive portal

Operations — how is it used in practice?

AskExamples
Install / first-run experience?Onboarding wizard, default settings, migration from old version
Updates and rollback?Zero-downtime deploys, schema migrations, feature flag rollback
Backups and recovery?Point-in-time restore, export/import, audit log retention
Monitoring and alerting?Error rates, latency SLOs, on-call playbook

Time — how does it behave over time?

AskExamples
Timeouts and waits?Session expiry, request timeout, retry-after, cron schedule
Aging data?Stale tokens, expired coupons, year-end rollovers, leap years
Concurrency?Two users editing the same record, race in checkout, double-click
Order of operations?Out-of-order webhooks, replayed events, late deliveries

RCRCRC — Regression Testing

Six lenses for choosing what to retest after a change. Useful when you can't run everything.

LetterFocusWhat to look at
RecentRecently changedAreas touched in the last sprint or PR
CoreCritical functionsLogin, checkout, billing, search — the things users do every day
RiskyHigh-complexity / past-bug heavyMulti-step flows, integrations, payments, race conditions
ConfigurationSettings + environment-specificDifferent roles, locales, plans, OS, browser, device
RepairedRecently fixedVerify the fix and check for fresh regressions around it
ChronicPersistently flaky / buggyAreas that bite every release no matter what

FEW HICCUPPS — Consistency Heuristics

Bach and Kaner's consistency oracles. The product should be consistent with each of these — call out any divergence as a bug, observation, or question.

LetterConsistency with…Probe
FamiliarSimilar productsDoes it follow conventions a user already knows?
ExplainabilityA clear explanationCould you explain this behaviour to a customer with a straight face?
WorldRealityDoes it model real-world physics, money, dates, geography correctly?
HistoryPast versionsHas long-standing behaviour silently changed?
ImageBrand and reputationDoes this match the polish customers expect from us?
ComparableComparable productsHow does the closest competitor handle this case?
ClaimsMarketing and docsDo the screenshots, docs, and copy still match the product?
User expectationsWhat users expectIs the surprise factor low for the typical user?
PurposeStated purposeDoes this support the feature's reason for existing?
ProductItselfAre similar things in the app handled the same way internally?
StandardsApplicable standardsRFCs, W3C, WCAG, ISO — does it comply where required?

Touring Heuristics (James Whittaker)

Run a tour when you need a focused mode of exploration. Each tour has a different goal and yields different bugs.

  • Guidebook Tour — Walk through the user-facing documentation step by step. Verify every example works as written; flag wording that's misleading or stale.
  • Money Tour — Test the features the company makes money on (checkout, subscription upgrade, paid-only flows). Bugs here have the highest blast radius.
  • Landmark Tour — List the 5–10 features users care about most, then visit each. Useful as a smoke-test charter.
  • Intellectual Tour — Find the hardest features to test (state machines, concurrent editing, billing). Spend a session on the one you're most afraid of.
  • FedEx Tour — Follow data through the system end to end: input → validation → processing → storage → output → notification. Look for places it can be lost or transformed wrong.
  • Garbage Collector Tour — Feed the system input it doesn't want: invalid types, oversize files, malformed JSON, unexpected encodings. Check how cleanly errors propagate.
  • Bad Neighborhood Tour — Spend the session in areas where bugs cluster historically. Look at recent bug reports and test their neighbourhood.
  • Museum Tour — Test legacy features no one has touched in years. They often quietly broke during refactors.
  • Back Alley Tour — Explore the least-used features (admin tools, edge-case settings, obscure shortcuts). Expect more bugs per minute than in the main flows.
  • All-Nighter Tour — Leave the app running overnight. Memory leaks, expired sessions, day-rollover bugs, scheduled-job failures often only show up after hours.
  • Supermodel Tour — Test only the UI: layout, alignment, colour, contrast, hover/focus states. Don't click anything that submits.
  • Couch Potato Tour — Do the bare minimum. Accept defaults. Click "OK" without reading. Surfaces what happens to users who don't pay attention.

Session-Based Test Management (SBTM)

A way to make exploratory testing accountable without scripting it.

The structure

ConceptDefinition
SessionAn uninterrupted block of testing time, usually 60–90 minutes
CharterA single mission for the session, written before it starts
DebriefA short conversation after the session — surface findings and refine the next charter

Example charters

  • Explore the login flow with invalid credentials across Chrome, Safari, and Firefox.
  • Probe the new bulk-import feature with the Garbage Collector tour for 60 minutes.
  • Verify the password-reset email survives quoted-printable encoding in Outlook.
  • Run the Money Tour on the upgrade flow with a previously-cancelled subscription.

Session sheet template

Charter:        Explore the bulk-import feature using the Garbage Collector tour
Tester:         Vimal
Date / time:    2026-05-03 / 10:00–11:00
Duration:       60 minutes (45 testing, 15 setup/notes)

Areas covered:  CSV upload UI, /api/imports, validation messages, error reporting

Bugs found:
  #1234  Empty CSV is accepted, creates 0-row import (silent)
  #1235  CSV with BOM at start strips first column header
  #1236  100 MB file kills the upload progress bar (frozen at 0%)

Issues / questions:
  - Spec doesn't say whether duplicate rows should error or merge
  - 415 vs 422 inconsistency between front-end and API for "wrong file type"

Notes:
  - Drag-and-drop works on Chrome but not Safari iOS
  - Progress bar resets on tab switch — confirm if intentional

Useful metrics (don't overdo them)

  • Sessions completed vs planned
  • Charter completion % — did you finish the mission, or get distracted?
  • Bugs / session — track trend, not absolute (rises with familiarity, then falls)
  • Coverage map — which areas have been visited recently, which are stale

Input Variation Techniques

What to throw at every input field, every API parameter, every config value.

Boundary values

For any numeric or length-bounded input, test six points:

min - 1   ← invalid
min       ← valid (lower edge)
min + 1   ← valid (just inside)
max - 1   ← valid (just inside)
max       ← valid (upper edge)
max + 1   ← invalid

Equivalence partitions

Pick one representative from each class instead of testing every value.

FieldValid classInvalid classes
Age0–120-1, 121, "abc", 1.5
Emailwell-formedmissing @, missing TLD, double @, leading space
Phone (US)10 digits9 digits, 11 digits, letters, formatting only

Special characters

Always have a test that includes these:

<script>alert(1)</script>      HTML / XSS
'  "  --                       SQL quote breakers
' OR 1=1 --                    classic SQLi
${jndi:ldap://x}               Log4Shell-style
\x00 \r \n \t                  control characters
% _ ? *                        SQL / shell wildcards
🚀 漢字 ñ ø                      Unicode + emoji
${var}  {{var}}                template-injection markers
../../etc/passwd               path traversal

Empty / blank / null

  • Empty string ""
  • Whitespace only " "
  • Null / undefined
  • Field omitted from the request entirely
  • Field present but with value null

Length

0 chars
1 char
255 chars                      legacy varchar boundary
256 chars
1000+ chars                    paragraph / paste from doc
1 MB                           paste from a log file

Numbers

0
-0                             different from 0 in some systems
-1
1
INT_MAX (2147483647), INT_MAX + 1
INT_MIN, INT_MIN - 1
3.14159, 1e308, 1e-308
NaN, Infinity, -Infinity

Dates and times

1970-01-01T00:00:00Z           Unix epoch
1969-12-31T23:59:59Z           pre-epoch (often breaks)
2000-02-29                     leap day
2024-02-29 vs 2025-02-29       leap year vs non-leap
2038-01-19T03:14:07Z           32-bit Unix overflow
DST forward / backward day     timezone math
Dec 31 → Jan 1                 year rollover
Feb 28 → Mar 1                 month rollover
local time in non-UTC zone     server vs client time mismatch

File uploads

ProbeWhat it tests
Wrong extension (.exe renamed to .png)MIME-type validation vs extension check
Empty file (0 bytes)"did the upload succeed?" branch
Huge file (1 GB+)Server limits, progress, timeout, memory
No extensionDefault handling
Double extension (avatar.png.exe)Sanitisation
Path-traversal name (../../evil.png)Filename sanitisation
Same filename twiceConflict / overwrite policy
File during network dropResume / retry

State Transition Testing

For any feature with discrete states (orders, accounts, subscriptions, document workflow), explicitly test the transitions.

A four-step recipe

  1. Enumerate every state the system can be in.
  2. Map every transition the spec says is valid.
  3. Test each valid transition with the action that triggers it.
  4. Attempt every invalid transition — expect a clean refusal, not a crash.

Example — order state machine

PLACED → PAID → SHIPPED → DELIVERED
   ↓       ↓        ↓
CANCELLED  REFUNDED  RETURNED
FromActionExpected new state
PLACEDPayPAID
PLACEDCancelCANCELLED
PAIDShipSHIPPED
PAIDRefundREFUNDED
SHIPPEDCancelrejected — order already shipped
DELIVEREDPay againrejected — already paid
CANCELLEDPayrejected — terminal

Conditions worth probing

  • Interruption mid-transition — kill the network at the moment payment auth completes but before the order updates. What state is the order in?
  • Concurrent transitions — two admins click "Ship" within 100ms. Both succeed? One wins? Idempotent?
  • Replayed event — payment webhook arrives twice. Does the order go to PAID once or charge twice?
  • State after timeout — user abandons the checkout for an hour. Cart cleared, reserved stock released, session expired?

Bug Reporting During Exploration

You will lose details to memory faster than you think. Capture as you go.

Capture-as-you-go habits

  • Screen-record everything — keep a continuous recording during the session and clip the relevant 30 seconds when you spot something.
  • Console + network tab open at all times. Save the HAR if a bug looks API-related.
  • Write notes in a single scratch file with timestamps. Real notes beat reconstructed ones.
  • Keep a "questions" list separate from the bug list. Not every weirdness is a bug — some are spec gaps.

Before you file

  • Can you reproduce it? Try at least twice.
  • What is the shortest path? (Strip the steps that don't matter.)
  • Does it reproduce in another browser / role / environment?
  • Did the same thing exist before the recent change? (Avoid blaming the wrong PR.)
  • What's the right severity? (User-facing data loss = P0; cosmetic glitch on rarely-used screen = P3.)

Quick classification

TypeSmell
FunctionalWrong result, missing behaviour, broken flow
UILayout, alignment, contrast, focus state, copy
PerformanceSlow response, memory growth, CPU spike, pagination degradation
Edge caseSpecific input or state combination
IntegrationBoundary between two systems (frontend/backend, app/payment, app/email)
SecurityAuth bypass, injection, IDOR, data leakage, missing rate limit
AccessibilityKeyboard trap, missing label, contrast, screen-reader gap
QuestionSpec is ambiguous — file as a question, not a bug, until product confirms