p95 latency explained for QA engineers
What p95 actually means, why averages hide the bugs, and how to read a latency distribution as a tester.
What p95 actually means, why averages hide the bugs, and how to read a latency distribution as a tester.
The average response time is the metric most likely to make a slow system look fine. Here is what to watch instead.
Load testing is one type of performance test, not the whole thing. A single user can have a performance bug. Match the test (load/stress/spike/soak) to the risk.
Not a full load test — a fast, fixed, repeatable check on a few critical endpoints, compared to baseline, that catches gross regressions before sign-off.
Which k6 metrics matter and which mislead: check the error rate first, read p95/p99 not the average, confirm the load profile, and compare to a baseline.
Dead buttons, random logouts, missing data — often timing problems in disguise. The tell is intermittent and worse under load; check latency before debugging logic.
Derive thresholds from user expectation, today's baseline, and business impact — set on p95/p99 with an error-rate gate, tiered by criticality — not a made-up 'under 2s'.
The pitch: 'run load tests on every PR.' The reality: you'll have flaky thresholds in three days and disabled tests in two weeks. Here's the four-tier strategy that actually survives.
Three load-testing tools with three radically different ergonomics. JMeter has the 2004 XML/GUI legacy. Gatling stakes everything on Scala. k6 is the JavaScript-first newcomer. Here's the pick.