Engineering

REST API Rate Limiting: Patterns and Best Practices for 2026

TOT
Traffic Orchestrator Team
Engineering
March 16, 2026 14 min read 661 words
Share

Rate limiting is the immune system of your API. Without it, a single misbehaving client can bring down your entire service. With it done well, you protect your infrastructure, ensure fair usage, and even create upgrade incentives — customers who hit limits are natural candidates for higher-tier plans.

Why Rate Limit?

  • Infrastructure protection — Prevent resource exhaustion from runaway clients
  • Fair usage — Ensure one customer's load doesn't degrade service for others
  • Cost control — Limit compute and database costs per tenant
  • Abuse prevention — Stop credential stuffing, scraping, and DDoS
  • Revenue signal — Customers hitting limits are ready for an upgrade conversation

Algorithm Comparison

1. Fixed Window

The simplest approach: count requests in fixed time windows (e.g., 100 per minute). Easy to implement but suffers from the "boundary problem" — a burst at the end of one window and the start of the next allows 2x the limit.

// Fixed window — simple but has boundary issues
const key = `rate:${apiKey}:${Math.floor(Date.now() / 60000)}`
const count = await kv.get(key) || 0
if (count >= 100) return new Response('Too Many Requests', { status: 429 })
await kv.put(key, count + 1, { expirationTtl: 120 })

2. Sliding Window

Combines the simplicity of fixed windows with smoother throttling. Uses a weighted average between the current and previous window counts. This eliminates the boundary problem.

3. Token Bucket

The gold standard for production APIs. A bucket fills with tokens at a steady rate (e.g., 10 per second). Each request removes a token. When the bucket is empty, requests are rejected. This naturally handles bursts while enforcing sustained rate limits.

// Token bucket implementation
const tokenBucket = (config) => {
  return {
    consume: async (key) => {
      const now = Date.now()
      const bucket = await getBucket(key)
      
      // Refill tokens based on time elapsed
      const elapsed = now - bucket.lastRefill
      const newTokens = elapsed * (config.rate / 1000)
      bucket.tokens = Math.min(config.capacity, bucket.tokens + newTokens)
      bucket.lastRefill = now
      
      if (bucket.tokens < 1) return { allowed: false, retryAfter: (1 - bucket.tokens) / config.rate }
      
      bucket.tokens -= 1
      await saveBucket(key, bucket)
      return { allowed: true, remaining: Math.floor(bucket.tokens) }
    }
  }
}

4. Leaky Bucket

Similar to token bucket but processes requests at a fixed rate, queuing excess requests. Best for smoothing traffic spikes in background job processors.

Rate Limit by Identity

Apply different limits based on what you're identifying:

IdentityUse CaseExample Limit
API KeyRegular API access1,000 req/min
IP AddressUnauthenticated endpoints60 req/min
User IDPer-user actions30 req/min
EndpointExpensive operations10 req/min
Plan TierTiered access100-10,000 req/min

Response Headers: The Standard

Always include rate limit information in your response headers. This is the RateLimit header standard:

HTTP/1.1 200 OK
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1710734400

# When rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
RateLimit-Limit: 1000
RateLimit-Remaining: 0
RateLimit-Reset: 1710734400

Graceful Degradation

Don't just return 429 and walk away. Help the client recover:

  1. Retry-After header — Tell the client exactly when to retry
  2. Error body — Include the limit, remaining, and reset time in the response body
  3. Upgrade hint — If the customer is on a lower tier, include an upgrade URL
  4. Queue option — For non-urgent requests, offer a queued mode that processes when capacity is available
// Helpful 429 response
{
  "error": "rate_limit_exceeded",
  "message": "You've exceeded your plan's rate limit of 100 requests/minute",
  "limit": 100,
  "remaining": 0,
  "resetAt": "2026-03-18T12:35:00Z",
  "retryAfter": 30,
  "upgrade": "https://trafficorchestrator.com/pricing"
}

Per-Plan Rate Limiting

Differentiate your pricing tiers with rate limits. This is a natural upgrade lever — customers who need higher throughput pay for higher-tier plans.

PlanRequests/minBurstDaily Cap
Builder (Free)60101,000
Starter ($29)3005010,000
Professional ($99)1,000200100,000
Business ($299)5,0001,0001,000,000
EnterpriseCustomCustomUnlimited

Implementation Tips

  • Use edge storage — Store counters in edge KV or in-memory caches, not your primary database
  • Separate read/write limits — Read operations can tolerate higher limits than writes
  • Exempt health checks — Don't rate-limit GET /health or monitoring endpoints
  • Log rate limit events — Track which customers hit limits most (sales opportunity)
  • Test with load tools — Use k6 or Artillery to verify limits work under pressure
TOT
Traffic Orchestrator Team
Engineering

The engineering team behind Traffic Orchestrator, building enterprise-grade software licensing infrastructure used by developers worldwide.

Was this article helpful?
Get licensing insights delivered

Engineering deep-dives, security advisories, and product updates. Unsubscribe anytime.

Share this article
Free Plan Available

Ship licensing in your next release

5 licenses, 500 validations/month, full API access. Set up in under 5 minutes — no credit card required.

2-minute setup No credit card Cancel anytime