Q15 of 38 · Performance

How do you isolate whether a slow response is the database, application, or network?

PerformanceSeniorperformancediagnosticstracingapmbottleneck

Short answer

Short answer: Use distributed tracing to break the request into spans — DB query time, app compute time, network legs. APM (Datadog, New Relic) shows per-span breakdown. Compare to baselines, check error logs for timeouts, and run targeted tests on each layer in isolation.

Detail

Three-layer isolation is one of the most common senior-level diagnostic skills.

Step 1 — Trace the request. A modern APM (Datadog, New Relic, Honeycomb, or open-source Jaeger/Tempo with OpenTelemetry) breaks one request into spans. A typical trace might show: HTTP IN 1100ms → app.compute 50ms → db.query 950ms → http_response 100ms. Now you know it's the DB.

Step 2 — If no tracing, deduce by elimination. Hit the slow endpoint with a tool that strips the network: curl from the same host, then from a remote host. If local-curl is fast and remote-curl is slow, network. If both slow, server-side. If both fast, suspect the test client or the ingress.

Step 3 — Layer-isolation tests:

  • Database: run the slow query directly via psql/mysql client. If the raw query is fast but the app's call is slow, it's the ORM/connection pool/N+1, not the DB itself.
  • Application: profile the code path with a sampling profiler (py-spy, pprof, async-profiler). Look for hot functions or unexpected lock contention.
  • Network: mtr for path latency, iperf3 for throughput, ss -i for socket-level retransmit rate.

Step 4 — Confirm with a controlled change. A bottleneck hypothesis is just a hypothesis until you change it and the system improves. Add an index, expand the pool, or move the service closer to the DB — measure latency before/after.

The senior insight: bottlenecks shift. You fix the database, the app becomes the bottleneck. You fix the app, the network becomes the bottleneck. Always re-measure after a fix; the next limit may surprise you.

// WHAT INTERVIEWERS LOOK FOR

Methodology — distributed tracing first, fallback to elimination, isolation tests per layer, controlled change to confirm. Bonus for the 'bottlenecks shift' insight.

// COMMON PITFALL

Pattern-matching to a previous bug ('it must be the database again') without measuring. Bottlenecks move; the previous one isn't the current one.