Engineering

REST API Rate Limiting: Patterns and Best Practices for 2026

TOT

Traffic Orchestrator Team

Engineering

March 16, 2026 14 min read 661 words

Rate limiting is the immune system of your API. Without it, a single misbehaving client can bring down your entire service. With it done well, you protect your infrastructure, ensure fair usage, and even create upgrade incentives — customers who hit limits are natural candidates for higher-tier plans.

Why Rate Limit?

Infrastructure protection — Prevent resource exhaustion from runaway clients
Fair usage — Ensure one customer's load doesn't degrade service for others
Cost control — Limit compute and database costs per tenant
Abuse prevention — Stop credential stuffing, scraping, and DDoS
Revenue signal — Customers hitting limits are ready for an upgrade conversation

Algorithm Comparison

1. Fixed Window

The simplest approach: count requests in fixed time windows (e.g., 100 per minute). Easy to implement but suffers from the "boundary problem" — a burst at the end of one window and the start of the next allows 2x the limit.

// Fixed window — simple but has boundary issues
const key = `rate:${apiKey}:${Math.floor(Date.now() / 60000)}`
const count = await kv.get(key) || 0
if (count >= 100) return new Response('Too Many Requests', { status: 429 })
await kv.put(key, count + 1, { expirationTtl: 120 })

2. Sliding Window

Combines the simplicity of fixed windows with smoother throttling. Uses a weighted average between the current and previous window counts. This eliminates the boundary problem.

3. Token Bucket

The gold standard for production APIs. A bucket fills with tokens at a steady rate (e.g., 10 per second). Each request removes a token. When the bucket is empty, requests are rejected. This naturally handles bursts while enforcing sustained rate limits.

// Token bucket implementation
const tokenBucket = (config) => {
  return {
    consume: async (key) => {
      const now = Date.now()
      const bucket = await getBucket(key)
      
      // Refill tokens based on time elapsed
      const elapsed = now - bucket.lastRefill
      const newTokens = elapsed * (config.rate / 1000)
      bucket.tokens = Math.min(config.capacity, bucket.tokens + newTokens)
      bucket.lastRefill = now
      
      if (bucket.tokens < 1) return { allowed: false, retryAfter: (1 - bucket.tokens) / config.rate }
      
      bucket.tokens -= 1
      await saveBucket(key, bucket)
      return { allowed: true, remaining: Math.floor(bucket.tokens) }
    }
  }
}

4. Leaky Bucket

Similar to token bucket but processes requests at a fixed rate, queuing excess requests. Best for smoothing traffic spikes in background job processors.

Rate Limit by Identity

Apply different limits based on what you're identifying:

Identity	Use Case	Example Limit
API Key	Regular API access	1,000 req/min
IP Address	Unauthenticated endpoints	60 req/min
User ID	Per-user actions	30 req/min
Endpoint	Expensive operations	10 req/min
Plan Tier	Tiered access	100-10,000 req/min

Response Headers: The Standard

Always include rate limit information in your response headers. This is the RateLimit header standard:

HTTP/1.1 200 OK
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1710734400

# When rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1710734400

Graceful Degradation

Don't just return 429 and walk away. Help the client recover:

Retry-After header — Tell the client exactly when to retry
Error body — Include the limit, remaining, and reset time in the response body
Upgrade hint — If the customer is on a lower tier, include an upgrade URL
Queue option — For non-urgent requests, offer a queued mode that processes when capacity is available

// Helpful 429 response
{
  "error": "rate_limit_exceeded",
  "message": "You've exceeded your plan's rate limit of 100 requests/minute",
  "limit": 100,
  "remaining": 0,
  "resetAt": "2026-03-18T12:35:00Z",
  "retryAfter": 30,
  "upgrade": "https://trafficorchestrator.com/pricing"
}

Per-Plan Rate Limiting

Differentiate your pricing tiers with rate limits. This is a natural upgrade lever — customers who need higher throughput pay for higher-tier plans.

Plan	Requests/min	Burst	Daily Cap
Builder (Free)	60	10	1,000
Starter ($29)	300	50	10,000
Professional ($99)	1,000	200	100,000
Business ($299)	5,000	1,000	1,000,000
Enterprise	Custom	Custom	Unlimited

Implementation Tips

Use edge storage — Store counters in edge KV or in-memory caches, not your primary database
Separate read/write limits — Read operations can tolerate higher limits than writes
Exempt health checks — Don't rate-limit GET /health or monitoring endpoints
Log rate limit events — Track which customers hit limits most (sales opportunity)
Test with load tools — Use k6 or Artillery to verify limits work under pressure

Rate Limiting API REST Security Best Practices

TOT

Traffic Orchestrator Team

Engineering

The engineering team behind Traffic Orchestrator, building enterprise-grade software licensing infrastructure used by developers worldwide.

Was this article helpful?

Free Plan Available

Ship licensing in your next release

5 licenses, 500 validations/month, full API access. Set up in under 5 minutes — no credit card required.

Create Free Account Read the quick-start guide →

2-minute setup No credit card Cancel anytime

Why Rate Limit?

Algorithm Comparison

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

Rate Limit by Identity

Response Headers: The Standard

Graceful Degradation

Per-Plan Rate Limiting

Implementation Tips

Ship licensing in your next release

Related Articles