Mocking and Virtualisation

8 min read

Real services are slow, sometimes flaky, and occasionally inaccessible. A test suite that hits Stripe's sandbox on every run shares Stripe's downtime, Stripe's rate limits, and Stripe's evening maintenance window. Mocks let you replace those external dependencies with fast, deterministic fakes — the same shape on the wire, completely under your control. Get mocking right and your suite becomes faster and more reliable; get it wrong and your tests pass against a fantasy that diverges from production. This lesson covers when to mock, how to do it well, and the safeguards that prevent mock drift from biting you in production.

What "mocking" actually means

A mock is a fake API that returns canned responses to programmed requests. It runs locally, in CI, or as a sidecar — wherever your tests need it. Configuration looks like "when you see GET /users/42, respond with status 200 and this body."

The variations:

  • Static mocks — return the same response every time for a given request.
  • Stateful mocks — track state across calls (POST a user, then GET it back).
  • Service virtualisation — full simulation including state, latency, error injection, and configurable behaviour over time. Used for complex integrations (mainframes, partner systems).
  • Contract-driven mocks — generated from a Pact file or OpenAPI spec, so the mock can't drift from the agreed contract.

For most teams, contract-driven mocks plus a few hand-written stateful ones cover 95% of the use case.

When to mock

Three rules of thumb:

  • You own it → don't mock. If your team writes and deploys the service, just call it. Mocking your own code adds nothing and risks drift.
  • You don't own it → mock by default. Third-party APIs are slow, rate-limited, and can change underneath you. Mock for unit/functional tests; reserve real calls for explicit integration runs.
  • You're testing an error path → mock is the only way. "What happens when Stripe returns a 503?" is impossible to provoke against the real Stripe. Mocks let you assert on every error path your code claims to handle.

Mock server tools

A few tools you'll meet in real teams:

  • WireMock (JVM) — fully featured, stateful, used heavily in Java/Kotlin shops.
  • MSW / Mock Service Worker (JavaScript) — intercepts at the fetch layer, works in Node and browsers.
  • responses and respx (Python) — patches the requests and httpx libraries respectively.
  • Postman Mock Server — hosted mocks driven from a Postman collection.
  • Prism (Stoplight) — generates a mock server from an OpenAPI spec.
  • Hoverfly / Mountebank — full service-virtualisation tools with state and proxying.

Pick one per language. Mixing tools across the team's tests creates operational complexity for marginal gain.

A typical mock setup

The conceptual shape, regardless of tool:

mock.when(method="GET", path="/api/payments/123")
    .respond(status=200, body={"id": "123", "status": "completed", "amount": 99.99})

# In your test, the code under test calls https://payment-service.local/api/payments/123
# The mock intercepts and returns the canned response.

For a test of "what happens when payment fails":

mock.when(method="POST", path="/api/charges")
    .respond(status=402, body={"error": "card_declined", "code": "insufficient_funds"})

You can then assert that your application code correctly handles a 402 response — surfaces the right error to the user, logs the right metric, retries (or doesn't).

Stateful mocks

For flows that span multiple calls — create then read, login then act — a stateful mock tracks what previous calls did:

# WireMock-like pseudocode
mock.scenario("user-creation")
    .step("create").when("POST /users").willReturn({id: 9001})
    .nextStep("create-then-read")
    .step("read").when("GET /users/9001").willReturn({id: 9001, name: "Alice"})

The mock now responds to the GET only after the POST has fired. Useful for testing CRUD flows where the second call depends on the first.

The drift problem

The single biggest risk: your mock and the real service can disagree, and your tests will happily pass against the mock while production breaks against reality.

Mitigations, in increasing strength:

  • Periodic real-service smoke tests. A small suite that hits the real downstream service nightly. If it fails and your mocked tests don't, you've found drift.
  • Mock-as-code, kept next to real-service interaction code. When the real call site changes, the diff makes the mock-update obvious.
  • Mocks generated from contracts. Pact contracts and OpenAPI specs both produce mocks that can't drift from the contract. Strongly preferred for any non-trivial integration.

The combination — contract-driven mocks plus a thin real-service smoke layer — is the gold standard.

When NOT to mock

The opposite mistake: mocking when you should be using the real thing.

  • Testing the integration itself. "Does our service correctly call Stripe?" can't be answered against a Stripe mock. The integration test has to use the real sandbox.
  • Testing the contract is honoured. Schema validation against a mock proves nothing about the real service.
  • Performance characteristics. Mocks return in microseconds; real services in milliseconds or seconds. Mocked perf numbers are fiction.
  • Authentication-flow tests. Real auth servers have rate limits, expiry semantics, and cryptographic checks that mocks can't reproduce faithfully.

A useful framing: mock when the dependency is incidental to the test; use the real thing when the dependency is the test.

Recording and replay

A pragmatic middle ground: record real responses once, replay them in tests:

  • Run the test against the real service, capture the response (VCR libraries do this).
  • Subsequent runs replay from disk, no real call.
  • When the cassette is stale, re-record and review the diff.

VCR-style recording (Python's vcrpy, Ruby's vcr, JS's nock with recording mode) gives you mock-like speed with real-shaped responses. The trade-off is the recording must be refreshed periodically — and the diff between old and new recordings is itself a useful signal.

Mock observability

Worth checking that your mock layer surfaces useful failures:

  • When a request comes in that the mock doesn't know about, does it 404, fail loud, or silently default? A "default 200 with empty body" mock will hide tons of bugs.
  • When the mock is the wrong version of the contract, do tests fail with a useful message?
  • Can you log every interception and inspect after a failure?

Pick a tool that fails loud by default. "Unmatched request: GET /api/users/999" is the error you want; "[mock returned default response]" is the one you don't.

⚠️ Common mistakes

  • Mocking everything, integration-testing nothing. A suite of green mock-based tests can mask a fundamentally broken integration. Keep at least one real-service smoke layer.
  • Hand-written mocks that don't match the real contract. Always prefer contract-driven mocks (Pact, OpenAPI-derived). Hand-written mocks drift the day they're written.
  • Mocking your own code. It's almost always a smell. If your service is too slow to test directly, fix the slowness or split the test layer; don't paper over it with a mock.

🎯 Practice task

Set up a mock for a real test. 30-40 minutes.

  1. Pick a test in your project that hits a real downstream service. Note the request shape and response shape.
  2. Install a mocking library for your language: responses (Python), MSW (JS), WireMock (Java).
  3. Replace the real call in the test with a mock that returns the same shape. Run the test — confirm it still passes.
  4. Add a second test that exercises an error path: 500, 429, or a specific business error. With the mock, this is just "respond with 500" — try it.
  5. Diff the test runtime before and after. Mocks should be substantially faster.
  6. Stretch: if your team has an OpenAPI spec, install Prism (npm install -g @stoplight/prism-cli) and run prism mock openapi.yaml. Hit the resulting mock with curl. That's a contract-driven mock — observe how it stays in sync with the spec automatically.

The final lesson of this chapter pulls it all together — running everything you've built inside a real CI/CD pipeline.

// tip to track lessons you complete and pick up where you left off across devices.