Rate Limiter¶

Problem Statement¶

Design a distributed rate limiter that can protect APIs from abuse, ensure fair usage across customers, and handle high traffic volumes. The system should support multiple rate limiting strategies and be configurable per customer/endpoint.

Requirements¶

Functional Requirements¶

Limit requests per customer/API key
Support multiple rate limiting algorithms
Configure limits per endpoint/tier
Provide rate limit headers in responses
Support burst allowances
Allow real-time limit adjustments
Provide usage analytics

Non-Functional Requirements¶

Latency: < 5ms overhead per request
Availability: 99.99% (in critical path)
Accuracy: Near-accurate (small over-limit acceptable)
Scalability: Handle 100K+ requests/second
Consistency: Eventually consistent across nodes acceptable

High-Level Architecture¶

High-Level Architecture

Rate Limiting Algorithms¶

1. Token Bucket¶

Token Bucket Algorithm

public class TokenBucket {
    private final long capacity;        // Max tokens
    private final double refillRate;    // Tokens per second
    private double tokens;
    private long lastRefillTimestamp;

    public synchronized boolean tryConsume(int numTokens) {
        refill();

        if (tokens >= numTokens) {
            tokens -= numTokens;
            return true;
        }
        return false;
    }

    private void refill() {
        long now = System.currentTimeMillis();
        double tokensToAdd = (now - lastRefillTimestamp) * refillRate / 1000.0;
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTimestamp = now;
    }
}

2. Leaky Bucket¶

Leaky Bucket Algorithm

3. Fixed Window Counter¶

Fixed Window Counter

// Redis implementation
public boolean isAllowed(String key, int limit, int windowSeconds) {
    String windowKey = key + ":" + (System.currentTimeMillis() / 1000 / windowSeconds);

    long count = redis.incr(windowKey);
    if (count == 1) {
        redis.expire(windowKey, windowSeconds);
    }

    return count <= limit;
}

4. Sliding Window Log¶

Sliding Window Log

// Redis sorted set implementation
public boolean isAllowed(String key, int limit, int windowMs) {
    long now = System.currentTimeMillis();
    long windowStart = now - windowMs;

    // Remove old entries
    redis.zremrangeByScore(key, 0, windowStart);

    // Count current entries
    long count = redis.zcard(key);

    if (count < limit) {
        // Add current request
        redis.zadd(key, now, UUID.randomUUID().toString());
        redis.expire(key, windowMs / 1000);
        return true;
    }

    return false;
}

5. Sliding Window Counter (Recommended)¶

Sliding Window Counter

public class SlidingWindowCounter {

    public boolean isAllowed(String key, int limit, int windowSeconds) {
        long now = System.currentTimeMillis();
        long currentWindow = now / 1000 / windowSeconds;
        long previousWindow = currentWindow - 1;

        String currentKey = key + ":" + currentWindow;
        String previousKey = key + ":" + previousWindow;

        // Get counts
        long currentCount = Optional.ofNullable(redis.get(currentKey))
            .map(Long::parseLong).orElse(0L);
        long previousCount = Optional.ofNullable(redis.get(previousKey))
            .map(Long::parseLong).orElse(0L);

        // Calculate weight for previous window
        long windowStart = currentWindow * windowSeconds * 1000;
        double elapsedRatio = (now - windowStart) / (windowSeconds * 1000.0);
        double previousWeight = 1.0 - elapsedRatio;

        // Weighted count
        double effectiveCount = previousCount * previousWeight + currentCount;

        if (effectiveCount < limit) {
            redis.incr(currentKey);
            redis.expire(currentKey, windowSeconds * 2);
            return true;
        }

        return false;
    }
}

Algorithm Comparison¶

Algorithm	Accuracy	Memory	Burst Handling	Complexity
Token Bucket	High	Low	Allows burst	Medium
Leaky Bucket	High	Low	Smooths burst	Medium
Fixed Window	Low	Very Low	Boundary issue	Low
Sliding Log	Very High	High	Accurate	Medium
Sliding Counter	High	Low	Good	Medium

Recommendation: Use Sliding Window Counter for most cases, or Token Bucket when burst tolerance is needed.

Distributed Rate Limiting¶

Challenge¶

Distributed Rate Limiting Problem

Solution 1: Centralized Counter (Redis)¶

Redis-Based Solution

Redis Lua Script (Atomic Operations)¶

-- Sliding window counter in single atomic operation
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local current_window = math.floor(now / window)
local previous_window = current_window - 1

local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window

local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")

local window_start = current_window * window
local elapsed_ratio = (now - window_start) / window
local previous_weight = 1 - elapsed_ratio

local effective_count = previous_count * previous_weight + current_count

if effective_count < limit then
    redis.call("INCR", current_key)
    redis.call("EXPIRE", current_key, window * 2)
    return {1, limit - effective_count - 1}  -- allowed, remaining
else
    return {0, 0}  -- denied, remaining
end

Solution 2: Local + Sync Hybrid¶

Hybrid Approach

Configuration Model¶

Rate Limit Rules¶

rate_limits:
  # Global defaults
  default:
    requests_per_second: 100
    burst: 200

  # Per tier
  tiers:
    free:
      requests_per_minute: 100
      requests_per_day: 10000
    pro:
      requests_per_minute: 1000
      requests_per_day: 100000
    enterprise:
      requests_per_minute: 10000
      requests_per_day: unlimited

  # Per endpoint overrides
  endpoints:
    "/api/v1/charges":
      requests_per_second: 25
      burst: 50
    "/api/v1/customers":
      requests_per_second: 100
      burst: 200
    "/api/v1/reports":
      requests_per_minute: 10
      comment: "Heavy endpoint, strict limit"

  # Per customer overrides
  customers:
    "cust_enterprise_123":
      requests_per_second: 500
      burst: 1000

Configuration Schema¶

CREATE TABLE rate_limit_configs (
    id                  UUID PRIMARY KEY,

    -- Scope
    scope_type          VARCHAR(20) NOT NULL,   -- global, tier, customer, endpoint
    scope_value         VARCHAR(255),           -- tier name, customer_id, endpoint path

    -- Limits
    requests_per_second BIGINT,
    requests_per_minute BIGINT,
    requests_per_hour   BIGINT,
    requests_per_day    BIGINT,
    burst_size          INT,

    -- Metadata
    priority            INT DEFAULT 0,          -- Higher = more specific
    enabled             BOOLEAN DEFAULT true,
    created_at          TIMESTAMP NOT NULL,
    updated_at          TIMESTAMP NOT NULL
);

CREATE INDEX idx_rate_limit_scope ON rate_limit_configs(scope_type, scope_value);

Response Headers¶

Standard Rate Limit Headers¶

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
X-RateLimit-Policy: 100;w=60

-- Or with standard headers (draft RFC)
RateLimit-Limit: 100
RateLimit-Remaining: 45
RateLimit-Reset: 55

Rate Limited Response¶

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000

{
    "error": {
        "type": "rate_limit_error",
        "message": "Rate limit exceeded. Please retry after 30 seconds.",
        "retry_after": 30,
        "limit": 100,
        "window": "60s"
    }
}

Implementation¶

Rate Limiter Service¶

@Service
public class RateLimiterService {

    private final RedisTemplate<String, String> redis;
    private final RateLimitConfigService configService;

    public RateLimitResult checkRateLimit(RateLimitRequest request) {
        // 1. Get applicable config
        RateLimitConfig config = configService.getConfig(
            request.getCustomerId(),
            request.getTier(),
            request.getEndpoint()
        );

        // 2. Build rate limit key
        String key = buildKey(request.getCustomerId(), request.getEndpoint());

        // 3. Check each limit window
        for (RateLimitWindow window : config.getWindows()) {
            RateLimitResult result = checkWindow(key, window);
            if (!result.isAllowed()) {
                return result;
            }
        }

        return RateLimitResult.allowed(config);
    }

    private RateLimitResult checkWindow(String baseKey, RateLimitWindow window) {
        String key = baseKey + ":" + window.getName();

        // Execute Lua script atomically
        List<Long> result = redis.execute(
            SLIDING_WINDOW_SCRIPT,
            List.of(key),
            window.getLimit(),
            window.getWindowSeconds(),
            System.currentTimeMillis() / 1000
        );

        boolean allowed = result.get(0) == 1;
        long remaining = result.get(1);

        return new RateLimitResult(allowed, window.getLimit(), remaining, window.getResetTime());
    }
}

Gateway Integration¶

@Component
public class RateLimitFilter implements GatewayFilter {

    private final RateLimiterService rateLimiter;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String customerId = extractCustomerId(exchange);
        String endpoint = exchange.getRequest().getPath().value();

        RateLimitRequest request = RateLimitRequest.builder()
            .customerId(customerId)
            .endpoint(endpoint)
            .build();

        RateLimitResult result = rateLimiter.checkRateLimit(request);

        // Add headers
        ServerHttpResponse response = exchange.getResponse();
        response.getHeaders().add("X-RateLimit-Limit", String.valueOf(result.getLimit()));
        response.getHeaders().add("X-RateLimit-Remaining", String.valueOf(result.getRemaining()));
        response.getHeaders().add("X-RateLimit-Reset", String.valueOf(result.getResetTimestamp()));

        if (!result.isAllowed()) {
            response.setStatusCode(HttpStatus.TOO_MANY_REQUESTS);
            response.getHeaders().add("Retry-After", String.valueOf(result.getRetryAfter()));
            return response.writeWith(Mono.just(createErrorBody(result)));
        }

        return chain.filter(exchange);
    }
}

Handling Edge Cases¶

Graceful Degradation¶

public RateLimitResult checkRateLimitWithFallback(RateLimitRequest request) {
    try {
        return rateLimiter.checkRateLimit(request);
    } catch (RedisConnectionException e) {
        // Redis down - fallback options:

        // Option 1: Allow all (fail open)
        return RateLimitResult.allowed();

        // Option 2: Use local cache with conservative limit
        return localRateLimiter.checkRateLimit(request);

        // Option 3: Block all (fail closed) - only for critical protection
        return RateLimitResult.denied("Service temporarily unavailable");
    }
}

Multi-Dimensional Rate Limiting¶

// Rate limit by multiple dimensions
public RateLimitResult checkMultiDimensional(String customerId, String ip, String endpoint) {
    // Check customer rate limit
    RateLimitResult customerResult = checkLimit("customer:" + customerId, customerLimit);
    if (!customerResult.isAllowed()) {
        return customerResult;
    }

    // Check IP rate limit (for abuse prevention)
    RateLimitResult ipResult = checkLimit("ip:" + ip, ipLimit);
    if (!ipResult.isAllowed()) {
        return ipResult;
    }

    // Check endpoint rate limit (for expensive endpoints)
    RateLimitResult endpointResult = checkLimit("endpoint:" + endpoint, endpointLimit);
    if (!endpointResult.isAllowed()) {
        return endpointResult;
    }

    // Return the most restrictive result
    return mostRestrictive(customerResult, ipResult, endpointResult);
}

Monitoring & Alerting¶

Key Metrics¶

Metric	Description	Alert Threshold
Rate limit hit rate	% of requests hitting limit	> 5%
P99 latency	Rate limit check latency	> 5ms
Redis availability	Redis cluster health	< 99.9%
False positives	Legit requests blocked	> 0.1%

Dashboard¶

Rate Limiter Dashboard

Technology Choices¶

Component	Technology Options
Counter Store	Redis Cluster, Memcached
Config Store	PostgreSQL, etcd
Local Cache	Caffeine, Guava
Gateway	Kong, Envoy, Custom

Interview Discussion Points¶

How do you handle Redis failures?
Circuit breaker, local fallback, fail-open vs fail-closed
How accurate is the rate limiting?
Eventually consistent, may allow small bursts across nodes
How do you handle clock skew across servers?
Use Redis time, NTP sync, window-based algorithms
How do you prevent gaming (switching API keys)?
Rate limit by IP, fingerprint, account-level limits
How do you handle legitimate bursts?
Token bucket with burst allowance, separate burst limit
How do you scale the rate limiter?
Redis Cluster, consistent hashing, local caching