Percentile (p95, p99)
// Definition
A statistic that reports the value below which a given percentage of measurements fall. p95 means 95% of requests were faster than this number — and 5% were slower. Performance teams report tail percentiles because averages hide the slow long tail.
// Why it matters
Averages lie under load — a 200ms mean can hide a 4s p99 that's making your worst-affected users churn. Percentiles tell you the experience at the tail: p95 = 95% of requests are faster than this. QA reports percentiles, not means, because SLAs and real pain live in the tail.
// How to test
// Capture response times and assert the tail, not the average
const times: number[] = []
Cypress._.times(100, () => {
const t0 = performance.now()
cy.request('/api/dashboard').then(() => times.push(performance.now() - t0))
})
cy.then(() => {
const sorted = times.sort((a, b) => a - b)
const p95 = sorted[Math.floor(sorted.length * 0.95)]
expect(p95, 'p95 latency').to.be.lessThan(800) // SLA threshold
})// Common mistakes
- Reporting and alerting on the mean, hiding tail pain
- Too few samples to compute a meaningful p99 (need hundreds+)
- Measuring server time only, ignoring network + render the user actually feels
// Related terms
Latency
The time between sending a request and receiving the first byte of response. Lower is better. Often reported at percentiles (p50, p95, p99) since averages hide tail behaviour.
Response Time
Total time from request initiation to response completion. Includes latency plus transfer and processing time. The end-user-perceived performance metric.
Throughput
The amount of work completed per unit of time — requests per second, transactions per second. Higher is better.
Learn more · Non-Functional Testing Overview
Chapter 2 · Lesson 2: Key Performance Metrics — Response Time, Throughput, Latency