Rate Limiter¶
Problem Statement¶
Design a distributed rate limiter that can protect APIs from abuse, ensure fair usage across customers, and handle high traffic volumes. The system should support multiple rate limiting strategies and be configurable per customer/endpoint.
Requirements¶
Functional Requirements¶
- Limit requests per customer/API key
- Support multiple rate limiting algorithms
- Configure limits per endpoint/tier
- Provide rate limit headers in responses
- Support burst allowances
- Allow real-time limit adjustments
- Provide usage analytics
Non-Functional Requirements¶
- Latency: < 5ms overhead per request
- Availability: 99.99% (in critical path)
- Accuracy: Near-accurate (small over-limit acceptable)
- Scalability: Handle 100K+ requests/second
- Consistency: Eventually consistent across nodes acceptable
High-Level Architecture¶
Rate Limiting Algorithms¶
1. Token Bucket¶
public class TokenBucket {
private final long capacity; // Max tokens
private final double refillRate; // Tokens per second
private double tokens;
private long lastRefillTimestamp;
public synchronized boolean tryConsume(int numTokens) {
refill();
if (tokens >= numTokens) {
tokens -= numTokens;
return true;
}
return false;
}
private void refill() {
long now = System.currentTimeMillis();
double tokensToAdd = (now - lastRefillTimestamp) * refillRate / 1000.0;
tokens = Math.min(capacity, tokens + tokensToAdd);
lastRefillTimestamp = now;
}
}
2. Leaky Bucket¶
3. Fixed Window Counter¶
// Redis implementation
public boolean isAllowed(String key, int limit, int windowSeconds) {
String windowKey = key + ":" + (System.currentTimeMillis() / 1000 / windowSeconds);
long count = redis.incr(windowKey);
if (count == 1) {
redis.expire(windowKey, windowSeconds);
}
return count <= limit;
}
4. Sliding Window Log¶
// Redis sorted set implementation
public boolean isAllowed(String key, int limit, int windowMs) {
long now = System.currentTimeMillis();
long windowStart = now - windowMs;
// Remove old entries
redis.zremrangeByScore(key, 0, windowStart);
// Count current entries
long count = redis.zcard(key);
if (count < limit) {
// Add current request
redis.zadd(key, now, UUID.randomUUID().toString());
redis.expire(key, windowMs / 1000);
return true;
}
return false;
}
5. Sliding Window Counter (Recommended)¶
public class SlidingWindowCounter {
public boolean isAllowed(String key, int limit, int windowSeconds) {
long now = System.currentTimeMillis();
long currentWindow = now / 1000 / windowSeconds;
long previousWindow = currentWindow - 1;
String currentKey = key + ":" + currentWindow;
String previousKey = key + ":" + previousWindow;
// Get counts
long currentCount = Optional.ofNullable(redis.get(currentKey))
.map(Long::parseLong).orElse(0L);
long previousCount = Optional.ofNullable(redis.get(previousKey))
.map(Long::parseLong).orElse(0L);
// Calculate weight for previous window
long windowStart = currentWindow * windowSeconds * 1000;
double elapsedRatio = (now - windowStart) / (windowSeconds * 1000.0);
double previousWeight = 1.0 - elapsedRatio;
// Weighted count
double effectiveCount = previousCount * previousWeight + currentCount;
if (effectiveCount < limit) {
redis.incr(currentKey);
redis.expire(currentKey, windowSeconds * 2);
return true;
}
return false;
}
}
Algorithm Comparison¶
| Algorithm | Accuracy | Memory | Burst Handling | Complexity |
|---|---|---|---|---|
| Token Bucket | High | Low | Allows burst | Medium |
| Leaky Bucket | High | Low | Smooths burst | Medium |
| Fixed Window | Low | Very Low | Boundary issue | Low |
| Sliding Log | Very High | High | Accurate | Medium |
| Sliding Counter | High | Low | Good | Medium |
Recommendation: Use Sliding Window Counter for most cases, or Token Bucket when burst tolerance is needed.
Distributed Rate Limiting¶
Challenge¶
Solution 1: Centralized Counter (Redis)¶
Redis Lua Script (Atomic Operations)¶
-- Sliding window counter in single atomic operation
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window
local current_count = tonumber(redis.call("GET", current_key) or "0")
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local window_start = current_window * window
local elapsed_ratio = (now - window_start) / window
local previous_weight = 1 - elapsed_ratio
local effective_count = previous_count * previous_weight + current_count
if effective_count < limit then
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
return {1, limit - effective_count - 1} -- allowed, remaining
else
return {0, 0} -- denied, remaining
end
Solution 2: Local + Sync Hybrid¶
Configuration Model¶
Rate Limit Rules¶
rate_limits:
# Global defaults
default:
requests_per_second: 100
burst: 200
# Per tier
tiers:
free:
requests_per_minute: 100
requests_per_day: 10000
pro:
requests_per_minute: 1000
requests_per_day: 100000
enterprise:
requests_per_minute: 10000
requests_per_day: unlimited
# Per endpoint overrides
endpoints:
"/api/v1/charges":
requests_per_second: 25
burst: 50
"/api/v1/customers":
requests_per_second: 100
burst: 200
"/api/v1/reports":
requests_per_minute: 10
comment: "Heavy endpoint, strict limit"
# Per customer overrides
customers:
"cust_enterprise_123":
requests_per_second: 500
burst: 1000
Configuration Schema¶
CREATE TABLE rate_limit_configs (
id UUID PRIMARY KEY,
-- Scope
scope_type VARCHAR(20) NOT NULL, -- global, tier, customer, endpoint
scope_value VARCHAR(255), -- tier name, customer_id, endpoint path
-- Limits
requests_per_second BIGINT,
requests_per_minute BIGINT,
requests_per_hour BIGINT,
requests_per_day BIGINT,
burst_size INT,
-- Metadata
priority INT DEFAULT 0, -- Higher = more specific
enabled BOOLEAN DEFAULT true,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL
);
CREATE INDEX idx_rate_limit_scope ON rate_limit_configs(scope_type, scope_value);
Response Headers¶
Standard Rate Limit Headers¶
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000
X-RateLimit-Policy: 100;w=60
-- Or with standard headers (draft RFC)
RateLimit-Limit: 100
RateLimit-Remaining: 45
RateLimit-Reset: 55
Rate Limited Response¶
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000
{
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"retry_after": 30,
"limit": 100,
"window": "60s"
}
}
Implementation¶
Rate Limiter Service¶
@Service
public class RateLimiterService {
private final RedisTemplate<String, String> redis;
private final RateLimitConfigService configService;
public RateLimitResult checkRateLimit(RateLimitRequest request) {
// 1. Get applicable config
RateLimitConfig config = configService.getConfig(
request.getCustomerId(),
request.getTier(),
request.getEndpoint()
);
// 2. Build rate limit key
String key = buildKey(request.getCustomerId(), request.getEndpoint());
// 3. Check each limit window
for (RateLimitWindow window : config.getWindows()) {
RateLimitResult result = checkWindow(key, window);
if (!result.isAllowed()) {
return result;
}
}
return RateLimitResult.allowed(config);
}
private RateLimitResult checkWindow(String baseKey, RateLimitWindow window) {
String key = baseKey + ":" + window.getName();
// Execute Lua script atomically
List<Long> result = redis.execute(
SLIDING_WINDOW_SCRIPT,
List.of(key),
window.getLimit(),
window.getWindowSeconds(),
System.currentTimeMillis() / 1000
);
boolean allowed = result.get(0) == 1;
long remaining = result.get(1);
return new RateLimitResult(allowed, window.getLimit(), remaining, window.getResetTime());
}
}
Gateway Integration¶
@Component
public class RateLimitFilter implements GatewayFilter {
private final RateLimiterService rateLimiter;
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String customerId = extractCustomerId(exchange);
String endpoint = exchange.getRequest().getPath().value();
RateLimitRequest request = RateLimitRequest.builder()
.customerId(customerId)
.endpoint(endpoint)
.build();
RateLimitResult result = rateLimiter.checkRateLimit(request);
// Add headers
ServerHttpResponse response = exchange.getResponse();
response.getHeaders().add("X-RateLimit-Limit", String.valueOf(result.getLimit()));
response.getHeaders().add("X-RateLimit-Remaining", String.valueOf(result.getRemaining()));
response.getHeaders().add("X-RateLimit-Reset", String.valueOf(result.getResetTimestamp()));
if (!result.isAllowed()) {
response.setStatusCode(HttpStatus.TOO_MANY_REQUESTS);
response.getHeaders().add("Retry-After", String.valueOf(result.getRetryAfter()));
return response.writeWith(Mono.just(createErrorBody(result)));
}
return chain.filter(exchange);
}
}
Handling Edge Cases¶
Graceful Degradation¶
public RateLimitResult checkRateLimitWithFallback(RateLimitRequest request) {
try {
return rateLimiter.checkRateLimit(request);
} catch (RedisConnectionException e) {
// Redis down - fallback options:
// Option 1: Allow all (fail open)
return RateLimitResult.allowed();
// Option 2: Use local cache with conservative limit
return localRateLimiter.checkRateLimit(request);
// Option 3: Block all (fail closed) - only for critical protection
return RateLimitResult.denied("Service temporarily unavailable");
}
}
Multi-Dimensional Rate Limiting¶
// Rate limit by multiple dimensions
public RateLimitResult checkMultiDimensional(String customerId, String ip, String endpoint) {
// Check customer rate limit
RateLimitResult customerResult = checkLimit("customer:" + customerId, customerLimit);
if (!customerResult.isAllowed()) {
return customerResult;
}
// Check IP rate limit (for abuse prevention)
RateLimitResult ipResult = checkLimit("ip:" + ip, ipLimit);
if (!ipResult.isAllowed()) {
return ipResult;
}
// Check endpoint rate limit (for expensive endpoints)
RateLimitResult endpointResult = checkLimit("endpoint:" + endpoint, endpointLimit);
if (!endpointResult.isAllowed()) {
return endpointResult;
}
// Return the most restrictive result
return mostRestrictive(customerResult, ipResult, endpointResult);
}
Monitoring & Alerting¶
Key Metrics¶
| Metric | Description | Alert Threshold |
|---|---|---|
| Rate limit hit rate | % of requests hitting limit | > 5% |
| P99 latency | Rate limit check latency | > 5ms |
| Redis availability | Redis cluster health | < 99.9% |
| False positives | Legit requests blocked | > 0.1% |
Dashboard¶
Technology Choices¶
| Component | Technology Options |
|---|---|
| Counter Store | Redis Cluster, Memcached |
| Config Store | PostgreSQL, etcd |
| Local Cache | Caffeine, Guava |
| Gateway | Kong, Envoy, Custom |
Interview Discussion Points¶
- How do you handle Redis failures?
-
Circuit breaker, local fallback, fail-open vs fail-closed
-
How accurate is the rate limiting?
-
Eventually consistent, may allow small bursts across nodes
-
How do you handle clock skew across servers?
-
Use Redis time, NTP sync, window-based algorithms
-
How do you prevent gaming (switching API keys)?
-
Rate limit by IP, fingerprint, account-level limits
-
How do you handle legitimate bursts?
-
Token bucket with burst allowance, separate burst limit
-
How do you scale the rate limiter?
- Redis Cluster, consistent hashing, local caching