Skip to content

Rate Limiter

Problem Statement

Design a distributed rate limiter that can protect APIs from abuse, ensure fair usage across customers, and handle high traffic volumes. The system should support multiple rate limiting strategies and be configurable per customer/endpoint.


Requirements

Functional Requirements

  • Limit requests per customer/API key
  • Support multiple rate limiting algorithms
  • Configure limits per endpoint/tier
  • Provide rate limit headers in responses
  • Support burst allowances
  • Allow real-time limit adjustments
  • Provide usage analytics

Non-Functional Requirements

  • Latency: < 5ms overhead per request
  • Availability: 99.99% (in critical path)
  • Accuracy: Near-accurate (small over-limit acceptable)
  • Scalability: Handle 100K+ requests/second
  • Consistency: Eventually consistent across nodes acceptable

High-Level Architecture

High-Level Architecture


Rate Limiting Algorithms

1. Token Bucket

Token Bucket Algorithm

public class TokenBucket {
    private final long capacity;        // Max tokens
    private final double refillRate;    // Tokens per second
    private double tokens;
    private long lastRefillTimestamp;

    public synchronized boolean tryConsume(int numTokens) {
        refill();

        if (tokens >= numTokens) {
            tokens -= numTokens;
            return true;
        }
        return false;
    }

    private void refill() {
        long now = System.currentTimeMillis();
        double tokensToAdd = (now - lastRefillTimestamp) * refillRate / 1000.0;
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTimestamp = now;
    }
}

2. Leaky Bucket

Leaky Bucket Algorithm

3. Fixed Window Counter

Fixed Window Counter

// Redis implementation
public boolean isAllowed(String key, int limit, int windowSeconds) {
    String windowKey = key + ":" + (System.currentTimeMillis() / 1000 / windowSeconds);

    long count = redis.incr(windowKey);
    if (count == 1) {
        redis.expire(windowKey, windowSeconds);
    }

    return count <= limit;
}

4. Sliding Window Log

Sliding Window Log

// Redis sorted set implementation
public boolean isAllowed(String key, int limit, int windowMs) {
    long now = System.currentTimeMillis();
    long windowStart = now - windowMs;

    // Remove old entries
    redis.zremrangeByScore(key, 0, windowStart);

    // Count current entries
    long count = redis.zcard(key);

    if (count < limit) {
        // Add current request
        redis.zadd(key, now, UUID.randomUUID().toString());
        redis.expire(key, windowMs / 1000);
        return true;
    }

    return false;
}

Sliding Window Counter

public class SlidingWindowCounter {

    public boolean isAllowed(String key, int limit, int windowSeconds) {
        long now = System.currentTimeMillis();
        long currentWindow = now / 1000 / windowSeconds;
        long previousWindow = currentWindow - 1;

        String currentKey = key + ":" + currentWindow;
        String previousKey = key + ":" + previousWindow;

        // Get counts
        long currentCount = Optional.ofNullable(redis.get(currentKey))
            .map(Long::parseLong).orElse(0L);
        long previousCount = Optional.ofNullable(redis.get(previousKey))
            .map(Long::parseLong).orElse(0L);

        // Calculate weight for previous window
        long windowStart = currentWindow * windowSeconds * 1000;
        double elapsedRatio = (now - windowStart) / (windowSeconds * 1000.0);
        double previousWeight = 1.0 - elapsedRatio;

        // Weighted count
        double effectiveCount = previousCount * previousWeight + currentCount;

        if (effectiveCount < limit) {
            redis.incr(currentKey);
            redis.expire(currentKey, windowSeconds * 2);
            return true;
        }

        return false;
    }
}

Algorithm Comparison

Algorithm Accuracy Memory Burst Handling Complexity
Token Bucket High Low Allows burst Medium
Leaky Bucket High Low Smooths burst Medium
Fixed Window Low Very Low Boundary issue Low
Sliding Log Very High High Accurate Medium
Sliding Counter High Low Good Medium

Recommendation: Use Sliding Window Counter for most cases, or Token Bucket when burst tolerance is needed.


Distributed Rate Limiting

Challenge

Distributed Rate Limiting Problem

Solution 1: Centralized Counter (Redis)

Redis-Based Solution

Redis Lua Script (Atomic Operations)

-- Sliding window counter in single atomic operation
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local current_window = math.floor(now / window)
local previous_window = current_window - 1

local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window

local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")

local window_start = current_window * window
local elapsed_ratio = (now - window_start) / window
local previous_weight = 1 - elapsed_ratio

local effective_count = previous_count * previous_weight + current_count

if effective_count < limit then
    redis.call("INCR", current_key)
    redis.call("EXPIRE", current_key, window * 2)
    return {1, limit - effective_count - 1}  -- allowed, remaining
else
    return {0, 0}  -- denied, remaining
end

Solution 2: Local + Sync Hybrid

Hybrid Approach


Configuration Model

Rate Limit Rules

rate_limits:
  # Global defaults
  default:
    requests_per_second: 100
    burst: 200

  # Per tier
  tiers:
    free:
      requests_per_minute: 100
      requests_per_day: 10000
    pro:
      requests_per_minute: 1000
      requests_per_day: 100000
    enterprise:
      requests_per_minute: 10000
      requests_per_day: unlimited

  # Per endpoint overrides
  endpoints:
    "/api/v1/charges":
      requests_per_second: 25
      burst: 50
    "/api/v1/customers":
      requests_per_second: 100
      burst: 200
    "/api/v1/reports":
      requests_per_minute: 10
      comment: "Heavy endpoint, strict limit"

  # Per customer overrides
  customers:
    "cust_enterprise_123":
      requests_per_second: 500
      burst: 1000

Configuration Schema

CREATE TABLE rate_limit_configs (
    id                  UUID PRIMARY KEY,

    -- Scope
    scope_type          VARCHAR(20) NOT NULL,   -- global, tier, customer, endpoint
    scope_value         VARCHAR(255),           -- tier name, customer_id, endpoint path

    -- Limits
    requests_per_second BIGINT,
    requests_per_minute BIGINT,
    requests_per_hour   BIGINT,
    requests_per_day    BIGINT,
    burst_size          INT,

    -- Metadata
    priority            INT DEFAULT 0,          -- Higher = more specific
    enabled             BOOLEAN DEFAULT true,
    created_at          TIMESTAMP NOT NULL,
    updated_at          TIMESTAMP NOT NULL
);

CREATE INDEX idx_rate_limit_scope ON rate_limit_configs(scope_type, scope_value);

Response Headers

Standard Rate Limit Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
X-RateLimit-Policy: 100;w=60

-- Or with standard headers (draft RFC)
RateLimit-Limit: 100
RateLimit-Remaining: 45
RateLimit-Reset: 55

Rate Limited Response

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000

{
    "error": {
        "type": "rate_limit_error",
        "message": "Rate limit exceeded. Please retry after 30 seconds.",
        "retry_after": 30,
        "limit": 100,
        "window": "60s"
    }
}

Implementation

Rate Limiter Service

@Service
public class RateLimiterService {

    private final RedisTemplate<String, String> redis;
    private final RateLimitConfigService configService;

    public RateLimitResult checkRateLimit(RateLimitRequest request) {
        // 1. Get applicable config
        RateLimitConfig config = configService.getConfig(
            request.getCustomerId(),
            request.getTier(),
            request.getEndpoint()
        );

        // 2. Build rate limit key
        String key = buildKey(request.getCustomerId(), request.getEndpoint());

        // 3. Check each limit window
        for (RateLimitWindow window : config.getWindows()) {
            RateLimitResult result = checkWindow(key, window);
            if (!result.isAllowed()) {
                return result;
            }
        }

        return RateLimitResult.allowed(config);
    }

    private RateLimitResult checkWindow(String baseKey, RateLimitWindow window) {
        String key = baseKey + ":" + window.getName();

        // Execute Lua script atomically
        List<Long> result = redis.execute(
            SLIDING_WINDOW_SCRIPT,
            List.of(key),
            window.getLimit(),
            window.getWindowSeconds(),
            System.currentTimeMillis() / 1000
        );

        boolean allowed = result.get(0) == 1;
        long remaining = result.get(1);

        return new RateLimitResult(allowed, window.getLimit(), remaining, window.getResetTime());
    }
}

Gateway Integration

@Component
public class RateLimitFilter implements GatewayFilter {

    private final RateLimiterService rateLimiter;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String customerId = extractCustomerId(exchange);
        String endpoint = exchange.getRequest().getPath().value();

        RateLimitRequest request = RateLimitRequest.builder()
            .customerId(customerId)
            .endpoint(endpoint)
            .build();

        RateLimitResult result = rateLimiter.checkRateLimit(request);

        // Add headers
        ServerHttpResponse response = exchange.getResponse();
        response.getHeaders().add("X-RateLimit-Limit", String.valueOf(result.getLimit()));
        response.getHeaders().add("X-RateLimit-Remaining", String.valueOf(result.getRemaining()));
        response.getHeaders().add("X-RateLimit-Reset", String.valueOf(result.getResetTimestamp()));

        if (!result.isAllowed()) {
            response.setStatusCode(HttpStatus.TOO_MANY_REQUESTS);
            response.getHeaders().add("Retry-After", String.valueOf(result.getRetryAfter()));
            return response.writeWith(Mono.just(createErrorBody(result)));
        }

        return chain.filter(exchange);
    }
}

Handling Edge Cases

Graceful Degradation

public RateLimitResult checkRateLimitWithFallback(RateLimitRequest request) {
    try {
        return rateLimiter.checkRateLimit(request);
    } catch (RedisConnectionException e) {
        // Redis down - fallback options:

        // Option 1: Allow all (fail open)
        return RateLimitResult.allowed();

        // Option 2: Use local cache with conservative limit
        return localRateLimiter.checkRateLimit(request);

        // Option 3: Block all (fail closed) - only for critical protection
        return RateLimitResult.denied("Service temporarily unavailable");
    }
}

Multi-Dimensional Rate Limiting

// Rate limit by multiple dimensions
public RateLimitResult checkMultiDimensional(String customerId, String ip, String endpoint) {
    // Check customer rate limit
    RateLimitResult customerResult = checkLimit("customer:" + customerId, customerLimit);
    if (!customerResult.isAllowed()) {
        return customerResult;
    }

    // Check IP rate limit (for abuse prevention)
    RateLimitResult ipResult = checkLimit("ip:" + ip, ipLimit);
    if (!ipResult.isAllowed()) {
        return ipResult;
    }

    // Check endpoint rate limit (for expensive endpoints)
    RateLimitResult endpointResult = checkLimit("endpoint:" + endpoint, endpointLimit);
    if (!endpointResult.isAllowed()) {
        return endpointResult;
    }

    // Return the most restrictive result
    return mostRestrictive(customerResult, ipResult, endpointResult);
}

Monitoring & Alerting

Key Metrics

Metric Description Alert Threshold
Rate limit hit rate % of requests hitting limit > 5%
P99 latency Rate limit check latency > 5ms
Redis availability Redis cluster health < 99.9%
False positives Legit requests blocked > 0.1%

Dashboard

Rate Limiter Dashboard


Technology Choices

Component Technology Options
Counter Store Redis Cluster, Memcached
Config Store PostgreSQL, etcd
Local Cache Caffeine, Guava
Gateway Kong, Envoy, Custom

Interview Discussion Points

  1. How do you handle Redis failures?
  2. Circuit breaker, local fallback, fail-open vs fail-closed

  3. How accurate is the rate limiting?

  4. Eventually consistent, may allow small bursts across nodes

  5. How do you handle clock skew across servers?

  6. Use Redis time, NTP sync, window-based algorithms

  7. How do you prevent gaming (switching API keys)?

  8. Rate limit by IP, fingerprint, account-level limits

  9. How do you handle legitimate bursts?

  10. Token bucket with burst allowance, separate burst limit

  11. How do you scale the rate limiter?

  12. Redis Cluster, consistent hashing, local caching