Microservice Integration Test Plan

Plan integration testing across three ShopFlow microservices for the v2.3 release — map service dependencies, define critical workflows, scope the test effort, design mock/stub strategies, and document environment, test data, failure scenario, and observability needs.

Role

SDET

Difficulty

Intermediate

Time limit

2 hr

Scenario

ShopFlow v2.3 introduces coordinated changes across three microservices: User Profile (adds a phoneNumber field and a new GET /users/{id}/activity endpoint), Orders (adds a new POST /orders/bulk endpoint for B2B bulk ordering), and Notifications (adds phone-based SMS notifications triggered by order confirmation events). Each service has its own team and its own CI pipeline, but they share a staging environment and a single Kafka event bus. Your task is to produce a complete integration test plan covering the interaction points between these services, so the QA team and engineering leads can sign off before the release enters the staging freeze window.

Requirements

1.Produce a service dependency map describing which service calls which, over what protocol (HTTP, Kafka event, or both), and in which direction data flows for each of the three critical workflows
2.Identify at least five critical workflows that span at least two services each — express each as a sequence (Service A → Service B → Service C) with the trigger, the happy-path outcome, and the key integration point being tested
3.Define the integration test scope: list what is in scope and what is explicitly out of scope, with a one-sentence justification for each out-of-scope item
4.Describe the environment strategy: what must be deployed and ready in the staging environment before integration testing begins, and what are the entry criteria that must be met
5.Design a test data strategy: what data must be seeded, who creates it, whether it is shared or isolated per test run, and how it is cleaned up after the run
6.Define a mock and stub strategy: which dependencies should be mocked (with justification), which should be live, and how mocks will be kept in sync with real service behaviour as services evolve
7.Document at least five failure scenarios — each describing an integration point that could fail, the expected observable behaviour (error code, event not published, downstream side-effect), and how the scenario would be detected during testing
8.List the observability and logging requirements needed to diagnose failures during the integration test run (e.g. correlated trace IDs, Kafka consumer lag monitoring, specific log fields)

Starter data

›Service dependency overview: - User Profile service: HTTP REST, no Kafka dependency; persists to Postgres users DB - Orders service: HTTP REST + Kafka producer; calls User Profile HTTP on order creation to verify user role; publishes ORDER_CONFIRMED event to Kafka topic 'order-events' - Notifications service: Kafka consumer on 'order-events'; calls User Profile HTTP to get phoneNumber for SMS; calls an external SMS gateway (Twilio) to send SMS; calls an internal email service for email notifications - ShopFlow frontend: calls Orders HTTP for order placement, calls User Profile HTTP for profile display
›v2.3 changes per service: User Profile: +phoneNumber field on GET /users/{id}; +GET /users/{id}/activity endpoint Orders: +POST /orders/bulk endpoint (accepts array of order items for B2B customers) Notifications: +SMS notification path (triggered by ORDER_CONFIRMED Kafka event; requires phoneNumber from User Profile)
›Staging environment: single shared staging cluster; all three services deploy independently; Kafka is a shared staging broker with isolated topic namespaces per environment (staging.order-events); Twilio is accessible in test mode (test credentials suppress real SMS delivery)
›Downstream dependencies outside the three services: Twilio SMS gateway (external, test mode available); internal email service (separate team, stable API, low change rate); Postgres databases (one per service, shared staging instance)
›Known constraints: the Notifications service has a Kafka consumer that batches events every 30 seconds in staging — integration tests must account for this latency; Twilio test mode silently drops SMS but returns a 200 — evidence of SMS dispatch must come from Twilio's test message log, not a real delivery check

Expected deliverables

✓Service dependency map: a diagram or structured description showing all three services, their protocols (HTTP / Kafka), data flow direction, and external dependencies (Twilio, email service)
✓Critical workflows list: at least five workflows, each with a trigger, step-by-step service sequence, happy-path outcome, and the specific integration point under test
✓Scope table: in-scope items and out-of-scope items, each with a one-sentence justification
✓Environment entry criteria: a numbered checklist of conditions that must be true before integration testing begins (services deployed, test data seeded, Kafka topics available, etc.)
✓Test data strategy: a table or description covering each data entity required (users, orders, product catalogue), who creates it, whether it is shared or per-run, and how it is cleaned up
✓Mock and stub strategy: a table listing each dependency, the decision (live / mock / stub), the justification, and the sync mechanism (contract test, snapshot, or manual update)
✓Failure scenarios: at least five entries, each with the integration point, the failure trigger, the expected observable behaviour, and the detection method
✓Observability requirements: a list of log fields, trace IDs, metrics, or monitoring signals needed to diagnose failures during the test run

Evaluation rubric

Dimension	What reviewers look for
Dependency map completeness	Does the dependency map capture all integration points, including the Kafka event path (Orders → Kafka → Notifications), the two HTTP calls Notifications makes (to User Profile for phoneNumber and to Twilio for SMS), and the frontend-to-services calls? A map that only shows HTTP connections and omits the Kafka event bus misses the most architecturally interesting integration point in this release — the async event-driven path is where the hardest-to-diagnose failures will occur.
Critical workflow selection	Are the chosen workflows genuinely cross-service and tied to the v2.3 changes? At least one workflow must cover the full SMS path (frontend places order → Orders publishes ORDER_CONFIRMED → Notifications consumes event → Notifications calls User Profile for phoneNumber → Notifications sends SMS). Workflows that test only a single service interaction (e.g. 'GET /users/{id} returns phoneNumber') are not integration workflows — they are API tests.
Scope definition clarity	Are out-of-scope items justified by a principled reason — not just 'too hard' or 'not my team'? Valid out-of-scope items: E2E UI testing (separate test phase), performance testing (separate load test plan), unit-level business logic within a single service (unit test responsibility), Twilio's internal SMS delivery (external dependency in test mode, not verifiable in staging). The justification must explain what test phase or team owns the excluded item.
Test data and environment strategy	Does the test data strategy account for: shared vs isolated data (shared staging Kafka broker means events from other test runs can interfere), cleanup after async operations (Notifications consumes Kafka events 30 seconds after publish — cleanup scripts must wait for this), and seeding requirements for the bulk order endpoint (requires B2B customer accounts with vendor role)? A strategy that just says 'seed test users' without addressing async cleanup or cross-run interference shows incomplete thinking.
Mock and stub strategy reasoning	Is the decision to mock or use live justified for each dependency? Twilio should be in test mode rather than fully mocked (test mode provides evidence of SMS dispatch intent without sending real SMS); the internal email service should be live if it is stable and low-change (mocking it would hide real integration failures); the Postgres databases should be live in staging (using mocked data stores prevents discovering real query-level integration issues). A strategy that mocks everything for speed misses integration failures; one that uses everything live creates dependencies on external stability.
Failure scenario quality	Do the failure scenarios test integration-specific failure modes, not just individual service errors? Good integration failure scenarios: Notifications service cannot find a phoneNumber for the user (User Profile returns null) — what does Notifications do — does it fall back to email-only or silently drop the notification? Orders service publishes ORDER_CONFIRMED but Kafka consumer lag causes Notifications to process the event 45 seconds later — does the test assert eventual delivery, and what is the timeout? A failure scenario that only tests 'User Profile returns 500' is a unit-level failure, not an integration-specific one.

Sample solution outline

›Dependency map summary: Frontend → Orders (HTTP POST /orders, POST /orders/bulk); Frontend → User Profile (HTTP GET /users/{id}); Orders → User Profile (HTTP GET /users/{id} on order creation, reads role); Orders → Kafka [producer] → order-events topic → Notifications [consumer]; Notifications → User Profile (HTTP GET /users/{id}, reads email + phoneNumber); Notifications → Twilio (HTTP, test mode); Notifications → Email Service (HTTP). External: Twilio (test mode), Email Service (internal, separate team).
›Critical workflows: (1) Standard order placement with SMS: user places order (frontend) → Orders verifies role (User Profile) → ORDER_CONFIRMED published → Notifications reads phoneNumber (User Profile) → SMS dispatched via Twilio; (2) Bulk order placement: B2B vendor places bulk order → Orders processes array → ORDER_CONFIRMED published per item → downstream as above; (3) Order placement for user without phoneNumber: same path but User Profile returns null phoneNumber → Notifications falls back to email-only; (4) Order placement for admin role user: Orders calls User Profile, role='admin' → Orders returns 403, no event published, no notification sent; (5) User Profile activity tracking: user places order → GET /users/{id}/activity includes order event in activity log (cross-service data consistency check)
›Scope: IN SCOPE — cross-service HTTP integration points, Kafka event publish/consume path, error propagation between services, phoneNumber null-handling in Notifications. OUT OF SCOPE — UI rendering (E2E test phase), load/throughput testing (separate plan), single-service unit logic, real SMS delivery verification (Twilio test mode only), Email Service internal logic (stable, tested by email team).
›Environment entry criteria: (1) v2.3 of all three services deployed to staging; (2) phoneNumber column present in User Profile staging DB; (3) Kafka staging topic 'staging.order-events' confirmed accessible from all three services; (4) Twilio test credentials configured in Notifications service; (5) At least 10 test users seeded (mix of customer, vendor, admin roles; at least 3 with phoneNumber, 2 without); (6) At least 5 B2B vendor accounts seeded for bulk order tests; (7) Email service staging endpoint confirmed reachable
›Test data strategy: user accounts — seeded once per test run via a setup script, isolated by email prefix (test-{runId}@shopflow-test.com) to avoid cross-run collisions; product catalogue — shared read-only staging catalogue, no cleanup needed; orders — created during test runs, cleaned up 60 seconds after test completion (to allow Kafka consumer to process events before deletion); Kafka offset — reset to latest before each test run to ignore pre-existing events
›Mock/stub strategy: Twilio — live in test mode (test mode returns 200 and logs to Twilio dashboard; provides real evidence of dispatch intent without sending SMS); Email Service — live (stable service, low change rate; mocking would hide real integration failures); Kafka — live staging broker (shared but namespace-isolated; no mock needed); User Profile from Orders perspective — live (this is the primary integration under test; mocking it would defeat the purpose); User Profile from Notifications perspective — live
›Failure scenarios: (1) Notifications receives ORDER_CONFIRMED but GET /users/{id} returns 404 — expected: Notifications logs error and sends email-only, no SMS; detection: check Notifications error log for 'user not found' entry + verify email was sent; (2) Notifications receives ORDER_CONFIRMED but phoneNumber is null — expected: Notifications falls back to email-only; detection: Twilio dashboard shows no test message, email service log shows delivery; (3) ORDER_CONFIRMED published by Orders but Notifications Kafka consumer is down — expected: event persists in Kafka topic; after consumer restart, event is processed; detection: Kafka consumer lag metric > 0; (4) User Profile returns 503 when Orders calls it during order creation — expected: Orders returns 503 to frontend, no event published; detection: Orders error log shows upstream timeout, Kafka topic has no new event; (5) Bulk order POST /orders/bulk with a mix of valid and invalid items — expected: all-or-nothing transaction; detection: verify either all items create ORDER_CONFIRMED events or none do
›Observability requirements: correlation trace ID propagated in X-Trace-Id header across all HTTP calls (Orders → User Profile, Notifications → User Profile) and in Kafka message headers; Kafka consumer lag metric exported to staging monitoring dashboard (alert threshold > 100 messages); structured logs in Notifications service including userId, orderId, hasPhoneNumber (boolean), notificationChannel (sms/email/both) for each notification attempt; Twilio test message log accessible to QA team for SMS dispatch evidence

Common mistakes

Drawing a dependency map that only shows HTTP connections and omits the Kafka event bus — the async path from Orders to Notifications is the highest-risk integration point in this release because failures are silent (the order succeeds but the notification is never sent) and harder to detect without explicit event-lag monitoring
Choosing workflows that only span a single service boundary — 'test that GET /users/{id} returns phoneNumber' is an API test, not an integration workflow; integration workflows must span at least two services and test a business-meaningful sequence
Designing a test data strategy that uses shared accounts without per-run isolation — shared test users in a concurrent test environment cause race conditions where one test's order event is consumed by Notifications in another test's assertion window
Mocking the User Profile service from the Notifications service's perspective in integration tests — mocking the primary integration partner defeats the purpose of integration testing; contract tests verify the mock is accurate, but integration tests must use a real live dependency
Ignoring the 30-second Kafka consumer batching delay in failure scenario timing — a test that asserts 'no SMS was sent' immediately after order placement will produce a false pass if the consumer hasn't yet processed the event; assertions on async side-effects need a polling mechanism with a reasonable timeout
Writing failure scenarios that only test single-service HTTP errors (e.g. 'User Profile returns 500') without describing the end-to-end propagation — the integration-specific question is: when User Profile returns 500 during Notifications processing, does Notifications retry, fall back, or silently drop the notification?
Omitting observability requirements from the test plan — without correlated trace IDs and structured logging, diagnosing why a test fails in staging requires manual cross-service log correlation, which significantly increases the mean time to diagnose an integration failure

Submission checklist

Service dependency map covering all HTTP and Kafka integration points and external dependencies
At least five critical workflows, each cross-service, with trigger, service sequence, happy-path outcome, and integration point identified
Scope table with in-scope items and out-of-scope items, each with a one-sentence justification
Environment entry criteria checklist (minimum five items) covering all three services, Kafka, Twilio, and test data
Test data strategy addressing entity creation, per-run isolation, async cleanup timing, and cross-run collision prevention
Mock and stub strategy for each external dependency with a decision (live/mock/stub) and justification
At least five failure scenarios each with integration point, trigger, expected observable behaviour, and detection method
Observability requirements list covering trace ID propagation, Kafka consumer lag monitoring, and structured log fields

Extension ideas

+Implement the 'standard order placement with SMS' critical workflow as an end-to-end integration test using a real staging environment, with a polling assertion that waits up to 60 seconds for the Twilio test message log entry to appear
+Add a chaos engineering scenario to the test plan: simulate User Profile returning 503 for 30 seconds during a test run and verify that Orders correctly queues retries and Notifications eventually processes all pending ORDER_CONFIRMED events once User Profile recovers
+Define a contract test strategy for the Kafka event schema: specify how the ORDER_CONFIRMED event payload is versioned, who owns the schema, and how breaking changes to the event shape are communicated between the Orders producer and Notifications consumer