Stages — Ramping Up and Down

9 min read

Constant load — a fixed number of VUs running for a fixed duration — is the simplest K6 configuration. But real-world traffic is never constant. It builds, peaks, and fades. Stages let you model that shape in a single test run, and they are how most of the standard testing patterns (load, stress, spike, soak) are implemented in K6.

What stages are

Stages define a sequence of target VU counts over time. K6 linearly interpolates between them:

export const options = {
  stages: [
    { duration: '30s', target: 20 },   // ramp from 0 to 20 VUs over 30s
    { duration: '1m',  target: 20 },   // hold at 20 VUs for 1 minute
    { duration: '30s', target: 0 },    // ramp from 20 to 0 VUs over 30s
  ],
};

At the start of the first stage, K6 begins adding VUs gradually from 0 to 20 over 30 seconds. During the second stage it holds 20 VUs steady. During the third stage it removes VUs gradually back to 0. The test ends when all stages complete.

The final target: 0 stage matters: it gives in-flight requests time to complete and produces a clean ramp-down curve in your Grafana dashboard rather than an abrupt cliff.

The four standard patterns

Load test: the baseline pattern

A load test verifies your system handles expected traffic with acceptable response times and error rates. Ramp up gradually, hold at target, release:

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // warm up to expected peak
    { duration: '5m', target: 100 },   // hold at peak
    { duration: '2m', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95th percentile under 500ms
    http_req_failed:   ['rate<0.01'],   // less than 1% errors
  },
};

The gradual ramp-up matters for cold caches. If you start with 100 VUs simultaneously, every request hits a cold cache at once — inflating your p95 in a way that does not reflect how production traffic actually builds. A 2-minute ramp-up lets the system warm naturally.

Stress test: finding the limit

A stress test ramps past expected capacity to find the breaking point and observe how the system fails:

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // normal operating load
    { duration: '5m', target: 200 },   // push past normal
    { duration: '2m', target: 300 },   // keep pushing
    { duration: '2m', target: 400 },   // find degradation
    { duration: '2m', target: 0 },     // ramp down
  ],
};

Watch http_req_duration and http_req_failed as VU count climbs. The inflection point — where response times start climbing sharply or error rate crosses 1% — is the system's practical capacity limit. This number is more valuable than just knowing "it passed at 100 VUs" because it tells you your safety margin.

Spike test: simulating sudden surges

A spike test applies an abrupt jump from low to high load in seconds — the kind of traffic pattern caused by a marketing email hitting 200,000 inboxes at the same time, or a product going viral on social media:

export const options = {
  stages: [
    { duration: '1m',  target: 10 },   // normal baseline
    { duration: '10s', target: 500 },  // sudden spike
    { duration: '3m',  target: 500 },  // hold spike — watch system response
    { duration: '10s', target: 10 },   // drop back to normal
    { duration: '3m',  target: 10 },   // verify recovery
    { duration: '30s', target: 0 },    // ramp down
  ],
};

The 10-second ramp from 10 to 500 VUs is the spike. Watch whether the system absorbs the load, autoscales quickly enough, or collapses into cascading errors. The recovery phase after the drop back to 10 VUs is equally important — a system that collapses under a spike but does not recover gracefully is worse than one that degrades predictably.

Soak test: long-duration stability

A soak test runs normal load for an extended period to surface problems that only emerge over time — memory leaks that fill the heap after six hours, database connection pools that exhaust when connections are not released correctly:

export const options = {
  stages: [
    { duration: '5m',  target: 50 },   // warm up to normal load
    { duration: '8h',  target: 50 },   // hold steady for 8 hours
    { duration: '5m',  target: 0 },    // ramp down
  ],
};

Run this overnight. Watch for http_req_duration starting at 150ms and climbing to 400ms over six hours — a classic symptom of a growing in-memory cache with no eviction policy, or unclosed database connections accumulating in a pool.

Why the ramp-up is not optional

Two practical reasons the warm-up stage matters, beyond just looking like a nicer graph:

Cold caches. Applications cache frequently requested data in memory. The first request for each cache key is expensive; subsequent requests are fast. Starting at full VU count means every VU hits a cold cache simultaneously, inflating your p95 beyond anything real production traffic produces. A 2-minute ramp-up lets caches warm the same way they do when production traffic builds in the morning.

Connection pool priming. Your application's database connection pool starts empty or partially filled. A sudden flood of 500 VUs triggers simultaneous connection creation, which can look like a latency spike that has nothing to do with your application's steady-state performance.

⚠️ Common mistakes

  • Skipping the final target: 0 stage. Without a ramp-down, K6 abruptly stops all VUs when the last stage ends. In-flight requests are dropped and your metrics chart shows a vertical cliff rather than a graceful tail. Always end with a ramp-down stage.
  • Making the ramp-up too short for what you are testing. A 10-second ramp to 500 VUs is a spike test. If your goal is to measure steady-state performance at 500 VUs, ramp slowly enough that caches and connection pools have time to stabilise — typically 2–5 minutes.
  • Running soak tests against staging databases with small data volumes. A memory leak triggered by scanning a table with 50 million rows will not appear if your staging database has 10,000 rows. Soak tests need data that resembles production scale to reveal the defects they are designed to find.

🎯 Practice task

Implement a load test pattern with stages and observe how VU count affects metrics. 30 minutes.

  1. Write a script targeting https://test.k6.io with this stage pattern: ramp to 10 VUs over 30s → hold 10 VUs for 2m → ramp to 0 VUs over 30s.
  2. Add thresholds: http_req_duration: ['p(95)<500'] and http_req_failed: ['rate<0.01'].
  3. Run the test. Watch vus in the output fluctuate. Note the p(95) during the hold phase.
  4. Modify the script to add a spike: after the first hold phase, jump to 50 VUs in 5 seconds, hold for 30 seconds, then ramp back to 10 VUs and hold for 1 minute before ramping down.
  5. Run again. Compare http_req_duration p(95) during the normal hold phase versus the spike hold phase. By how much does the 95th percentile change?

This practice builds the intuition for how load shape affects latency and error metrics — the core skill for interpreting load test results in production triage.

// tip to track lessons you complete and pick up where you left off across devices.