Rate limiting is the immune system of your API. Without it, a single misbehaving client can bring down your entire service. With it done well, you protect your infrastructure, ensure fair usage, and even create upgrade incentives — customers who hit limits are natural candidates for higher-tier plans.
Why Rate Limit?
- Infrastructure protection — Prevent resource exhaustion from runaway clients
- Fair usage — Ensure one customer's load doesn't degrade service for others
- Cost control — Limit compute and database costs per tenant
- Abuse prevention — Stop credential stuffing, scraping, and DDoS
- Revenue signal — Customers hitting limits are ready for an upgrade conversation
Algorithm Comparison
1. Fixed Window
The simplest approach: count requests in fixed time windows (e.g., 100 per minute). Easy to implement but suffers from the "boundary problem" — a burst at the end of one window and the start of the next allows 2x the limit.
// Fixed window — simple but has boundary issues
const key = `rate:${apiKey}:${Math.floor(Date.now() / 60000)}`
const count = await kv.get(key) || 0
if (count >= 100) return new Response('Too Many Requests', { status: 429 })
await kv.put(key, count + 1, { expirationTtl: 120 })
2. Sliding Window
Combines the simplicity of fixed windows with smoother throttling. Uses a weighted average between the current and previous window counts. This eliminates the boundary problem.
3. Token Bucket
The gold standard for production APIs. A bucket fills with tokens at a steady rate (e.g., 10 per second). Each request removes a token. When the bucket is empty, requests are rejected. This naturally handles bursts while enforcing sustained rate limits.
// Token bucket implementation
const tokenBucket = (config) => {
return {
consume: async (key) => {
const now = Date.now()
const bucket = await getBucket(key)
// Refill tokens based on time elapsed
const elapsed = now - bucket.lastRefill
const newTokens = elapsed * (config.rate / 1000)
bucket.tokens = Math.min(config.capacity, bucket.tokens + newTokens)
bucket.lastRefill = now
if (bucket.tokens < 1) return { allowed: false, retryAfter: (1 - bucket.tokens) / config.rate }
bucket.tokens -= 1
await saveBucket(key, bucket)
return { allowed: true, remaining: Math.floor(bucket.tokens) }
}
}
}
4. Leaky Bucket
Similar to token bucket but processes requests at a fixed rate, queuing excess requests. Best for smoothing traffic spikes in background job processors.
Rate Limit by Identity
Apply different limits based on what you're identifying:
| Identity | Use Case | Example Limit |
|---|---|---|
| API Key | Regular API access | 1,000 req/min |
| IP Address | Unauthenticated endpoints | 60 req/min |
| User ID | Per-user actions | 30 req/min |
| Endpoint | Expensive operations | 10 req/min |
| Plan Tier | Tiered access | 100-10,000 req/min |
Response Headers: The Standard
Always include rate limit information in your response headers. This is the RateLimit header standard:
HTTP/1.1 200 OK
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1710734400
# When rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1710734400
Graceful Degradation
Don't just return 429 and walk away. Help the client recover:
- Retry-After header — Tell the client exactly when to retry
- Error body — Include the limit, remaining, and reset time in the response body
- Upgrade hint — If the customer is on a lower tier, include an upgrade URL
- Queue option — For non-urgent requests, offer a queued mode that processes when capacity is available
// Helpful 429 response
{
"error": "rate_limit_exceeded",
"message": "You've exceeded your plan's rate limit of 100 requests/minute",
"limit": 100,
"remaining": 0,
"resetAt": "2026-03-18T12:35:00Z",
"retryAfter": 30,
"upgrade": "https://trafficorchestrator.com/pricing"
}
Per-Plan Rate Limiting
Differentiate your pricing tiers with rate limits. This is a natural upgrade lever — customers who need higher throughput pay for higher-tier plans.
| Plan | Requests/min | Burst | Daily Cap |
|---|---|---|---|
| Builder (Free) | 60 | 10 | 1,000 |
| Starter ($29) | 300 | 50 | 10,000 |
| Professional ($99) | 1,000 | 200 | 100,000 |
| Business ($299) | 5,000 | 1,000 | 1,000,000 |
| Enterprise | Custom | Custom | Unlimited |
Implementation Tips
- Use edge storage — Store counters in edge KV or in-memory caches, not your primary database
- Separate read/write limits — Read operations can tolerate higher limits than writes
- Exempt health checks — Don't rate-limit GET /health or monitoring endpoints
- Log rate limit events — Track which customers hit limits most (sales opportunity)
- Test with load tools — Use k6 or Artillery to verify limits work under pressure
Ship licensing in your next release
5 licenses, 500 validations/month, full API access. Set up in under 5 minutes — no credit card required.