Q23 of 38 · Performance

How would you justify investment in a dedicated performance testing function?

PerformanceLeadperformanceleadershiproiinvestmentlead

Short answer

Short answer: Tie to incidents avoided (cost of one Black Friday outage), customer impact (latency complaints, churn correlation with p95), and engineering time saved (centralised expertise vs. every team relearning). A single P1 outage typically dwarfs annual investment — the value is in *not* having that outage.

Detail

The hidden problem with performance investment: it pays off by absence — outages that didn't happen. Justifying the spend requires reframing.

Frame 1 — Cost of incidents avoided. Compute one prevented P1: lost revenue per hour during the outage, post-incident eng time, customer credits, brand damage (harder to quantify but real). For a marketplace at peak, a 2-hour outage on Black Friday can be £1-10M+. A perf team that prevents one such incident per year pays for itself many times over.

Frame 2 — Customer impact. Pull RUM data: what's the conversion drop between p50-fast and p95-slow users? Most companies find a 1-3% conversion delta per 100ms. Multiply by revenue: a 200ms p95 improvement on the checkout page is often worth £M+ annually. Performance is revenue, not infrastructure.

Frame 3 — Engineering time saved. Without a centralised function, every team rediscovers JMeter, builds half-baked tooling, and learns the same lessons. Centralised expertise: shared frameworks, a single CI integration, common dashboards. Time-to-first-load-test for a new service goes from weeks to days.

Frame 4 — Risk reduction. Pre-prod sign-off for major launches. Capacity planning before peak events. Vendor evaluation (could you switch from Postgres to Aurora? Perf team answers in a week). The team is an insurance policy against a class of risks the org otherwise can't quantify.

What the investment looks like (typical mid-size):

  • 2-3 senior engineers focused on perf.
  • Tooling: load infra (k6 Cloud, dedicated runners), APM, RUM.
  • Process: pre-prod perf sign-off for major changes, quarterly capacity review, peak-event game days.

Common counter-arguments and responses:

  • "Devs can write their own tests" — they can, and often badly. Centralised expertise raises the floor.
  • "We don't have perf problems" — meaning we don't measure them. RUM data usually surfaces problems on first look.
  • "It's expensive" — quantify the prevented incident at the rates above; the function pays for itself with one save.

Senior leadership signal: the framing is in revenue and risk, not in test runs. "We're investing £X to reduce £Y in expected outage cost and unlock £Z in conversion uplift." That's the language that gets funding.

// WHAT INTERVIEWERS LOOK FOR

Business framing — incidents avoided, customer impact, engineering efficiency, risk reduction. Numbers tied to org reality, not generic 'performance is good'. Awareness of common counter-arguments.

// COMMON PITFALL

Pitching in technical terms ('we'll have better load tests') — leadership funds outcomes, not activities. Frame it as revenue protection and conversion lift.