Designing a Rate Limiter That Survives Traffic Spikes
Rate limiting is one of those features you only notice when it breaks. The minute traffic spikes, your API either melts down or becomes unpredictable. A good rate limiter is not just about blocking requests. It is about protecting core systems while staying fair to real users.
This article is a practical guide for building a rate limiter that survives production traffic spikes.
What a Rate Limiter Should Guarantee
A reliable limiter should do three things:
- Protect downstream dependencies from overload.
- Enforce fairness across users, IPs, or API keys.
- Remain correct even when traffic bursts are huge.
The hard part is handling bursty traffic and distributed instances while keeping the limiter fast and simple.
Core Strategies (And What They Break)
1. Fixed Window
You count requests in a 1-minute bucket and reset every minute.
Pros: Simple. Cons: Burst at the window edge can double the allowed traffic.
2. Sliding Window
You count requests across the last 60 seconds, moving per request.
Pros: Accurate. Cons: More expensive to compute at scale.
3. Token Bucket
Tokens refill at a steady rate. Each request consumes one token.
Pros: Allows small bursts but respects long-term rate. Cons: Requires precise refill logic.
Production winner: token bucket, usually backed by Redis or in-memory LRU with periodic refill.
A Production-Friendly Token Bucket
Use Redis for a distributed limiter. Store:
tokens(float)last_refill_ts(ms)
Pseudo flow:
- Read current tokens and last refill time.
- Refill tokens based on elapsed time.
- If tokens >= 1, allow and decrement.
- Else, reject with 429.
Lua Script (Atomicity)
Redis Lua scripts let you do the whole flow atomically. This avoids race conditions under high QPS.
Dealing With Spikes
Burst Capacity
Set capacity > rate. This lets short spikes pass without breaking fairness.
Example:
- Rate: 10 req/sec
- Capacity: 30 tokens
This allows a 3x burst for short periods.
Fallback Behavior
If Redis is down:
- Default to allow? (risk overload)
- Default to deny? (risk downtime)
Most systems allow by default but have a circuit breaker to fail safe.
Recommended Headers
Return standard headers so clients behave well:
X-RateLimit-LimitX-RateLimit-RemainingRetry-After
These make rate limiting feel predictable, not random.
Common Failure Modes
- Clock drift between nodes.
- Hot keys when everyone hits the same key.
- Incorrect TTLs causing stuck limits.
Monitor limiter metrics just like any other service.
Final Checklist
- Choose token bucket for real traffic.
- Use Redis + Lua for atomicity.
- Allow small bursts with larger capacity.
- Emit headers and log limit hits.
A rate limiter should feel invisible when things are normal and calm when traffic is not. That is the difference between a hack and production-grade engineering.
Related Articles
Implementation and Performance Comparison of Sequential and Parallel Merge Sort Does Parallel Merge Sort Really Win? Implementing and Comparing from Scratch
A from-scratch comparison of sequential vs parallel merge sort, and why parallel doesn't always win.
Understanding the Tradeoff Between Reads and Writes in Databases and Why You Canβt Optimize Both at the Same Time
A clear explanation of the read/write tradeoff in databases and its impact on performance decisions.
Chess.comβs Authentication FlowβββWhatβs Missing and How to Fix It
Exploring Chess.com's authentication system: what happens when email verification is missing, the security vulnerabilities it creates, and how to build a stronger authentication flow