On this page8 sections
ConceptsIntermediate7-9 min reference

Performance Testing

What a QA engineer needs to test how a system behaves under load: the test types and what each answers, the metrics that actually matter (hint: not the average), how to design a load test that produces a real signal, and the line between back-end load testing and front-end web performance. Pair this with the API Testing Concepts sheet for the request side.

The test types

"Performance testing" is an umbrella. Each type answers a different question — pick the one that matches your risk.

TypeQuestion it answersShape of the load
LoadDoes it meet its targets at expected traffic?Ramp to expected peak, hold
StressWhere does it break, and how?Push past capacity until it fails
Soak (endurance)Does it degrade over time?Sustained moderate load for hours
SpikeDoes it survive a sudden surge?Sharp jump up, then down
VolumeDoes it cope with large data?Big datasets/payloads, not just traffic

Most teams start with a load test against a target, then add soak (catches memory leaks) and spike (catches autoscaling gaps) as the system matters more.

The metrics that matter

The single most important habit: report percentiles, not averages. An average hides the slow tail that users actually feel.

MetricWhat it tells you
Throughput (req/s, TPS)How much work the system handles
Latency / response timeHow long each request takes
p95 / p99 (percentiles)The experience of the slowest 5% / 1% — the real signal
Error rateShare of requests failing under load
Concurrency / active VUsHow many users are hitting it at once
Why averages lie:
  100 requests: 99 at 100ms, 1 at 5000ms
  average  = 149ms        -> looks fine
  p99      = 5000ms       -> one in a hundred users waits 5s
Report p95/p99 and error rate. The average is the least useful number.

Virtual users, concurrency, and ramp-up

A load tool simulates virtual users (VUs) — concurrent simulated clients. Two knobs shape the test:

  • Concurrency — how many VUs run at once (models real simultaneous users). See Concurrent Users.
  • Ramp-up — how fast you add VUs. Ramping gradually finds the point where latency degrades; slamming all VUs at once only tells you pass/fail. See Ramp-up Period.

Think in terms of arrival rate (requests per second) where you can — it's more stable than a fixed VU count when response times change mid-test.

Designing a load test

A repeatable shape that produces a real signal:

  1. Set targets — define the SLOs first (e.g. p95 < 500ms, error rate < 0.1% at 1,000 req/s). A test with no target can't pass or fail.
  2. Establish a baseline — measure the system at low, known load so you have something to compare against. See Baseline Testing.
  3. Model realistic traffic — mix the endpoints/journeys real users hit, with realistic think-time and data. One-endpoint hammering misrepresents the system.
  4. Ramp gradually — increase load in steps and watch where metrics turn.
  5. Hold and observe — sustain peak long enough to see steady-state behaviour, not just the spike of warm-up.
  6. Analyse against targets — compare p95/p99 and error rate to the SLOs, not to a vibe.

Use realistic, varied test data — reusing one record hits caches and flatters the result.

Reading results and finding bottlenecks

A failed target is the start, not the answer. Bottlenecks usually sit in one of a few places:

  • Application — slow code paths, lock contention, thread/connection-pool exhaustion.
  • Database — unindexed queries, the N+1 pattern, connection limits (often the first wall).
  • Infrastructure — CPU/memory saturation, network, undersized instances.
  • External dependencies — a downstream API or queue that caps your throughput.

Correlate the load tool's metrics with server-side observability (APM, Grafana dashboards) — the load tool tells you that it's slow; the server-side data tells you where. Watch for the knee in the curve: the concurrency level where latency rises sharply while throughput plateaus.

Back-end load vs front-end performance

Two different disciplines, often confused:

Back-end load testingFront-end web performance
QuestionDoes the server hold up under load?Is the page fast for one user?
Toolsk6, JMeter, Gatling, LocustLighthouse, WebPageTest
MetricsThroughput, p95 latency, error rateCore Web Vitals (LCP, INP, CLS)
ScopeMany simulated users, no real browserOne real browser, render/paint timings

Both matter: a server that scales perfectly still feels slow if the page renders poorly, and a fast page still fails if the API behind it falls over at 500 users. Test the layer that carries your risk — usually both.

The tool landscape

NeedTools
Code-scripted load (OSS)k6 (JS), Gatling (Scala/Java), Locust (Python), Artillery (JS/YAML)
GUI / record-based loadJMeter
Quick HTTP benchmarkingVegeta (constant-rate), wrk
Enterprise platformsLoadRunner, NeoLoad, LoadView
Front-end / Core Web VitalsLighthouse, WebPageTest

Quick performance testing checklist

  • Test type matches the question (load / stress / soak / spike / volume)
  • Targets/SLOs defined before the test (p95, error rate, throughput)
  • A baseline measured at known low load
  • Realistic traffic model — real journeys, think-time, varied data
  • Load ramped gradually, then held at peak
  • Results judged on percentiles (p95/p99) and error rate, not averages
  • Bottleneck located by correlating with server-side observability
  • Front-end performance (Core Web Vitals) covered where users feel it
  • Soak run for leaks / spike run for surges where the system warrants it
  • Results compared against the SLOs, with a clear pass/fail