Designing a Rate Limiter That Survives Traffic Spikes

Rate limiting is one of those features you only notice when it breaks. The minute traffic spikes, your API either melts down or becomes unpredictable. A good rate limiter is not just about blocking requests. It is about protecting core systems while staying fair to real users.

This article is a practical guide for building a rate limiter that survives production traffic spikes.

What a Rate Limiter Should Guarantee

A reliable limiter should do three things:

Protect downstream dependencies from overload.
Enforce fairness across users, IPs, or API keys.
Remain correct even when traffic bursts are huge.

The hard part is handling bursty traffic and distributed instances while keeping the limiter fast and simple.

Core Strategies (And What They Break)

1. Fixed Window

You count requests in a 1-minute bucket and reset every minute.

Pros: Simple. Cons: Burst at the window edge can double the allowed traffic.

2. Sliding Window

You count requests across the last 60 seconds, moving per request.

Pros: Accurate. Cons: More expensive to compute at scale.

3. Token Bucket

Tokens refill at a steady rate. Each request consumes one token.

Pros: Allows small bursts but respects long-term rate. Cons: Requires precise refill logic.

Production winner: token bucket, usually backed by Redis or in-memory LRU with periodic refill.

A Production-Friendly Token Bucket

Use Redis for a distributed limiter. Store:

tokens (float)
last_refill_ts (ms)

Pseudo flow:

Read current tokens and last refill time.
Refill tokens based on elapsed time.
If tokens >= 1, allow and decrement.
Else, reject with 429.

Lua Script (Atomicity)

Redis Lua scripts let you do the whole flow atomically. This avoids race conditions under high QPS.

Dealing With Spikes

Burst Capacity

Set capacity > rate. This lets short spikes pass without breaking fairness.

Example:

Rate: 10 req/sec
Capacity: 30 tokens

This allows a 3x burst for short periods.

Fallback Behavior

If Redis is down:

Default to allow? (risk overload)
Default to deny? (risk downtime)

Most systems allow by default but have a circuit breaker to fail safe.

Recommended Headers

Return standard headers so clients behave well:

X-RateLimit-Limit
X-RateLimit-Remaining
Retry-After

These make rate limiting feel predictable, not random.

Common Failure Modes

Clock drift between nodes.
Hot keys when everyone hits the same key.
Incorrect TTLs causing stuck limits.

Monitor limiter metrics just like any other service.

Final Checklist

Choose token bucket for real traffic.
Use Redis + Lua for atomicity.
Allow small bursts with larger capacity.
Emit headers and log limit hits.

A rate limiter should feel invisible when things are normal and calm when traffic is not. That is the difference between a hack and production-grade engineering.