Rate Limiting and Retry Strategies

7 min read

Real APIs cannot accept unlimited traffic. To prevent abuse, runaway clients, and the infrastructure cost of serving everyone equally, virtually every production API enforces some form of rate limiting — a cap on how many requests a single caller can make in a window. As a tester, you'll need to understand the limits enough to test them, work around them in your suite, and verify your application code retries sensibly when it hits them. This lesson covers all three.

How rate limits work

A rate limit is a budget: "X requests per Y time window per caller." Common shapes:

  • 60 requests per minute per API key.
  • 1,000 requests per hour per IP.
  • 10,000 requests per day per user.
  • Tiered: free plan = 100/hour, paid = 10,000/hour.

The server tracks how many requests each caller has made and rejects further ones once the budget is exhausted.

How rate limits are communicated

Well-designed APIs tell you about the limits in two ways:

Response headers (on every request)

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1715000000
  • Limit — the cap for this window.
  • Remaining — how many requests you have left.
  • Reset — Unix timestamp when the window resets.

These let you self-throttle without ever hitting an error.

When the limit is exceeded

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
 
{"error": "rate_limit_exceeded", "message": "Try again in 30 seconds"}

429 is the standard status code. Retry-After tells the client how many seconds to wait (or sometimes a date). Clients that ignore Retry-After and immediately retry waste effort and slow themselves down.

Some APIs use Retry-After with a date format:

Retry-After: Sat, 04 May 2026 12:00:00 GMT

Either form is valid — clients should handle both.

Testing rate limits

Five focused tests cover most of what matters.

ScenarioExpected
Below the limit (e.g. 50 requests when limit is 60/min)All succeed; Remaining decreases each call
Exactly at the limit (60th request)Succeeds with Remaining: 0
Just over the limit (61st request)429 with Retry-After header
Far over the limit (200 rapid requests)First N succeed, rest 429
After the reset windowRequests succeed again, counter reset

The "after reset" test is fiddly to automate (it requires waiting). For a 60-second window, it's usually fine. For an hour-long window, you'd typically test against a separate environment with shorter limits, or rely on the team's existing unit tests for the reset logic.

A subtle but valuable test: headers are accurate. After 10 requests, does X-RateLimit-Remaining show Limit - 10? Some APIs report stale values, which is misleading at best.

Hitting rate limits unintentionally in your suite

A test suite that fires 500 requests per CI run will hit a 100/min limit halfway through. Common mitigations:

  • Run against a non-production environment with relaxed or disabled rate limits.
  • Use multiple test API keys that round-robin between requests.
  • Respect X-RateLimit-Remaining in your test framework — pause when it drops below a threshold.
  • Mark perf/load tests separately from the regular suite so they run on dedicated infrastructure.

When in doubt, ask the team that owns the API how they recommend you load-test against it. Surprising them with 1,000 requests per minute on shared infrastructure is a fast way to make enemies.

Retry strategies

Whether your suite is hitting an API or your application is, both should retry intelligently when the server says "too busy" or "I broke." The right retry strategy depends on the error:

  • 5xx and 429 — usually retryable. The server says "try again later."
  • 4xx (except 429) — usually not retryable. The request itself is wrong; retrying won't fix it.
  • Network errors / timeouts — usually retryable.

Exponential backoff with jitter

Naively retrying immediately makes things worse — your client adds load to an already-struggling server. The standard fix is exponential backoff with jitter:

Step 1 of 6

Request fails (429 or 5xx)

Server is overloaded or rate-limited the call. Don't retry immediately.

Pseudo-code:

delay = 1.0
for attempt in range(5):
    response = call_api()
    if response.ok:
        return response
    if response.status_code not in (429, 500, 502, 503, 504):
        return response  # not retryable
    sleep(delay + random.uniform(0, 0.5))
    delay = min(delay * 2, 30)  # cap at 30s
raise Exception("max retries exceeded")

That tiny chunk catches a remarkable amount of value: it doesn't retry on 4xx (which would never recover), it backs off exponentially (gentle on the server), it jitters (avoids stampedes), and it caps the wait (prevents 10-minute hangs).

Honour Retry-After

If the server tells you exactly when to retry, listen:

if response.status_code == 429:
    wait = int(response.headers.get("Retry-After", "1"))
    sleep(wait)

This beats blind exponential backoff because the server is telling you something specific.

Idempotency and retries

Retrying a GET is always safe — same request, same effect. Retrying a POST is risky — you might create the same order twice.

Two solutions:

  • Use the right verb. PUT and DELETE are usually idempotent. POST is not.
  • Idempotency keys. Send a unique key with each POST. The server stores the first result against the key; repeats with the same key return the original result without re-processing. Stripe, AWS, and many modern APIs support this:
POST /api/orders
Idempotency-Key: order-attempt-9001-uuid
Content-Type: application/json
 
{"productId": 42, "quantity": 1}

If your retries duplicate the order, fix the test (use an idempotency key) and raise the issue with the API team if they don't support keys.

What to test in your application's retry logic

If your team builds retry logic into their client code, test it:

  • A 5xx triggers a retry.
  • A 4xx (except 429) does not retry.
  • The retry succeeds when the server recovers — final status is 200.
  • After max retries, the error propagates correctly.
  • Retries respect Retry-After.

A useful pattern for testing this: wire the test against a stub server you control, programmed to return 503 the first N times and 200 thereafter. We cover stub patterns in Chapter 8.

⚠️ Common mistakes

  • Retrying without backoff. Hammering a struggling server is the fastest way to keep it struggling. Always exponential, always jittered.
  • Retrying non-idempotent requests without idempotency keys. Duplicate orders, double charges, repeated emails. Pause and use the right pattern.
  • Treating 429 as a server error to alarm on. It's expected backpressure. Log it, back off, retry. Alert only when retries are exhausted.

🎯 Practice task

Test rate limits and retries. 30 minutes.

  1. Pick an API with documented rate limits — GitHub's REST API has 60/hour for unauthenticated, 5000/hour with a token.
  2. Make a single call and inspect curl -i output. Find the rate-limit headers. What's the limit? How many do you have left?
  3. Make 10 calls in quick succession. Observe X-RateLimit-Remaining decrement.
  4. Without exhausting the budget, deliberately trigger a 429 by hitting an aggressively-limited endpoint (some APIs have specific high-limit endpoints — e.g. GitHub's search). Note the Retry-After header.
  5. Sketch retry logic in pseudo-code or your favourite language. Cover: backoff, jitter, max retries, retry only on 5xx and 429.
  6. Stretch: simulate idempotent retries — make the same POST request twice without an idempotency key, then with one. The first scenario will likely create two records; the second should create one. Confirm.

That wraps up Chapter 4. Chapter 5 turns from what you send to what you get back — the response validation patterns that separate strong tests from superficial ones.

// tip to track lessons you complete and pick up where you left off across devices.