How do you build a sustainable performance test suite that runs in CI without becoming a bottleneck?

Question

Accepted Answer

Tier the suite — fast smoke perf on every PR (under 5 min, narrow scope), full load nightly, soak weekly. Use thresholds that fail loud on real regression but tolerate noise. Run on dedicated infra so dev pipelines aren't fighting load gen for resources. Three tiers, each with different goals and budgets: Tier 1 — PR smoke perf (every PR, < 5 min) Hits 5-10 critical endpoints with a small VU count (10-20). Asserts thresholds on absolute latency (e.g. p95 < 500ms) — if you can't pass these on a tiny load, something's badly wrong. Goal: catch egregious regressions before merge — a query that went from 50ms to 5s. Budget: tight time, high signal. Anything noisy gets pulled out of this tier. Tier 2 — Nightly load tests (every night, 30-60 min) Full target load against staging. Real SLOs as thresholds. Trend-tracked: today's p95 vs. last week's. Page on regression. Goal: catch slow-creep regressions and dependency drift. Tier 3 — Weekly soak / monthly capacity (scheduled) 8-24h soak at mode

How do you build a sustainable performance test suite that runs in CI without becoming a bottleneck?

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL

How do you build a sustainable performance test suite that runs in CI without becoming a bottleneck?

Short answer

Detail

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL