Idempotency Key System¶
Problem Statement¶
Design an idempotency key system that prevents duplicate operations in a distributed payment system. The system should ensure that retrying a request (due to network failures, timeouts, or client errors) produces the same result without creating duplicate charges or side effects.
Requirements¶
Functional Requirements¶
- Accept idempotency key with API requests
- Return cached response for duplicate requests
- Detect request hash mismatches (same key, different request)
- Support configurable TTL for keys
- Handle concurrent requests with same key
- Provide idempotency status in response
Non-Functional Requirements¶
- Latency: < 5ms overhead for idempotency check
- Consistency: Strong consistency for duplicate detection
- Availability: 99.99% (in critical payment path)
- Durability: Keys must persist for TTL duration
- Scalability: Handle millions of keys
High-Level Architecture¶
Core Concepts¶
What is Idempotency?¶
Data Model¶
Idempotency Record¶
CREATE TABLE idempotency_keys (
-- Composite primary key
api_key_id UUID NOT NULL,
idempotency_key VARCHAR(255) NOT NULL,
-- Request fingerprint
request_path VARCHAR(255) NOT NULL,
request_method VARCHAR(10) NOT NULL,
request_hash VARCHAR(64) NOT NULL, -- SHA-256 of request body
-- Status
status VARCHAR(20) NOT NULL, -- pending, completed, failed
-- Response (stored after completion)
response_code INT,
response_body TEXT,
response_headers JSONB,
-- Locking
locked_at TIMESTAMP,
lock_expires_at TIMESTAMP,
-- Timing
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
expires_at TIMESTAMP NOT NULL,
PRIMARY KEY (api_key_id, idempotency_key)
);
CREATE INDEX idx_idempotency_expires ON idempotency_keys(expires_at);
Redis Structure (Alternative)¶
Key: idempotency:{api_key_id}:{idempotency_key}
Value (JSON):
{
"status": "pending|completed|failed",
"request_hash": "sha256...",
"request_path": "/v1/charges",
"response": {
"code": 200,
"body": "{...}",
"headers": {...}
},
"locked_until": 1640000000,
"created_at": 1639999900,
"expires_at": 1640086300
}
TTL: Set to expires_at
Request Flow¶
State Machine¶
Request Processing Flow¶
Implementation¶
Middleware Implementation¶
@Component
public class IdempotencyMiddleware {
private final IdempotencyStore store;
private final Duration lockTimeout = Duration.ofSeconds(60);
private final Duration keyTtl = Duration.ofHours(24);
public Object processRequest(
String apiKeyId,
String idempotencyKey,
String requestPath,
String requestBody,
Supplier<Object> businessLogic
) {
// 1. Generate request hash
String requestHash = hashRequest(requestPath, requestBody);
String compositeKey = apiKeyId + ":" + idempotencyKey;
// 2. Try to acquire or find existing record
IdempotencyRecord record = store.findOrCreate(
compositeKey,
requestHash,
requestPath,
lockTimeout,
keyTtl
);
// 3. Handle based on record status
switch (record.getStatus()) {
case COMPLETED:
// Validate request hash matches
if (!record.getRequestHash().equals(requestHash)) {
throw new IdempotencyKeyConflictException(
"Idempotency key already used with different request"
);
}
// Return cached response
return record.getResponse();
case FAILED:
// Return cached error response
return record.getResponse();
case PENDING:
if (record.isLockedByOther()) {
// Another request is processing
throw new IdempotencyKeyInProgressException(
"Request with this idempotency key is already in progress"
);
}
// We acquired the lock, proceed with business logic
break;
}
// 4. Execute business logic
try {
Object response = businessLogic.get();
// 5. Store successful response
store.complete(compositeKey, response);
return response;
} catch (Exception e) {
// 6. Store failure response
store.fail(compositeKey, e);
throw e;
}
}
private String hashRequest(String path, String body) {
String content = path + "|" + body;
return DigestUtils.sha256Hex(content);
}
}
Redis Implementation with Lua Script¶
-- KEYS[1]: idempotency key
-- ARGV[1]: request hash
-- ARGV[2]: request path
-- ARGV[3]: lock timeout (seconds)
-- ARGV[4]: key TTL (seconds)
-- ARGV[5]: current timestamp
-- ARGV[6]: lock token (unique per request)
local key = KEYS[1]
local requestHash = ARGV[1]
local requestPath = ARGV[2]
local lockTimeout = tonumber(ARGV[3])
local keyTtl = tonumber(ARGV[4])
local now = tonumber(ARGV[5])
local lockToken = ARGV[6]
-- Try to get existing record
local existing = redis.call("GET", key)
if existing then
local record = cjson.decode(existing)
-- Check if completed or failed
if record.status == "completed" or record.status == "failed" then
-- Validate request hash
if record.request_hash ~= requestHash then
return cjson.encode({
status = "hash_mismatch",
error = "Idempotency key used with different request"
})
end
return cjson.encode({
status = record.status,
response = record.response
})
end
-- Check if lock is still valid
if record.locked_until and record.locked_until > now then
return cjson.encode({
status = "locked",
locked_until = record.locked_until
})
end
-- Lock expired, try to acquire
end
-- Create or update with new lock
local record = {
status = "pending",
request_hash = requestHash,
request_path = requestPath,
lock_token = lockToken,
locked_until = now + lockTimeout,
created_at = now
}
redis.call("SET", key, cjson.encode(record), "EX", keyTtl)
return cjson.encode({
status = "acquired",
lock_token = lockToken
})
Complete Response Storage¶
public void complete(String key, Object response) {
String script = """
local key = KEYS[1]
local lockToken = ARGV[1]
local responseJson = ARGV[2]
local ttl = tonumber(ARGV[3])
local existing = redis.call("GET", key)
if not existing then
return {err = "Key not found"}
end
local record = cjson.decode(existing)
-- Verify we hold the lock
if record.lock_token ~= lockToken then
return {err = "Lock not held"}
end
-- Update to completed
record.status = "completed"
record.response = cjson.decode(responseJson)
record.completed_at = tonumber(ARGV[4])
record.lock_token = nil
record.locked_until = nil
redis.call("SET", key, cjson.encode(record), "EX", ttl)
return {ok = "completed"}
""";
redis.eval(script, List.of(key),
lockToken,
objectMapper.writeValueAsString(response),
keyTtl.getSeconds(),
Instant.now().getEpochSecond()
);
}
Handling Edge Cases¶
1. Request Hash Mismatch¶
// Same idempotency key, different request body
if (!record.getRequestHash().equals(requestHash)) {
throw new IdempotencyKeyConflictException(
"Idempotency key '" + idempotencyKey + "' has already been used " +
"with a different request. Please use a new idempotency key."
);
}
// HTTP Response:
// 422 Unprocessable Entity
{
"error": {
"type": "idempotency_error",
"code": "key_in_use",
"message": "This idempotency key has been used with different request parameters."
}
}
2. Concurrent Requests¶
// Two requests arrive simultaneously with same key
// Request A acquires lock, Request B sees "PENDING"
// Response for Request B:
// 409 Conflict
{
"error": {
"type": "idempotency_error",
"code": "in_progress",
"message": "A request with this idempotency key is currently being processed.",
"retry_after": 5
}
}
// Client should retry after delay
3. Server Crash During Processing¶
4. Partial Failures¶
// Business logic has multiple side effects
@Transactional
public PaymentResult processPayment(PaymentRequest request) {
// Step 1: Charge card
ChargeResult charge = paymentProcessor.charge(request);
// Step 2: Update order status
orderService.markPaid(request.getOrderId());
// Step 3: Send confirmation email
emailService.sendReceipt(request.getCustomerId(), charge);
return new PaymentResult(charge);
}
// Problem: What if Step 3 fails?
// Options:
// 1. Store response after Step 1 (card charged)
// 2. Make Step 3 async (decouple from main flow)
// 3. Use saga pattern for rollback
API Design¶
Request Headers¶
POST /v1/charges HTTP/1.1
Host: api.stripe.com
Authorization: Bearer sk_test_xxx
Idempotency-Key: order_12345_charge_attempt_1
Content-Type: application/json
{
"amount": 2000,
"currency": "usd",
"source": "tok_visa"
}
Response Headers¶
HTTP/1.1 200 OK
Idempotency-Key: order_12345_charge_attempt_1
Idempotent-Replayed: true
Original-Request-Id: req_abc123
{
"id": "ch_xxx",
"amount": 2000,
"status": "succeeded"
}
Error Responses¶
# Key conflict (different request)
HTTP/1.1 422 Unprocessable Entity
{
"error": {
"type": "idempotency_error",
"code": "key_in_use",
"message": "This idempotency key was used with a different request."
}
}
# Request in progress
HTTP/1.1 409 Conflict
Retry-After: 5
{
"error": {
"type": "idempotency_error",
"code": "in_progress",
"message": "Request is already being processed."
}
}
Scalability Considerations¶
Redis Cluster¶
Cleanup Job¶
@Scheduled(cron = "0 0 * * * *") // Every hour
public void cleanupExpiredKeys() {
// For database implementation
int deleted = jdbcTemplate.update(
"DELETE FROM idempotency_keys WHERE expires_at < ?",
Instant.now()
);
log.info("Cleaned up {} expired idempotency keys", deleted);
}
// For Redis: TTL handles cleanup automatically
Monitoring¶
Key Metrics¶
| Metric | Description | Alert Threshold |
|---|---|---|
| Cache hit rate | % of requests with existing key | Informational |
| Conflict rate | % of hash mismatches | > 1% |
| Lock contention | % of "in progress" responses | > 5% |
| Redis latency | P99 for idempotency check | > 5ms |
Technology Choices¶
| Component | Technology Options |
|---|---|
| Primary Store | Redis (recommended), DynamoDB |
| Backup Store | PostgreSQL (for compliance) |
| Hashing | SHA-256 |
Interview Discussion Points¶
- Why is idempotency important for payments?
-
Network failures, timeouts, client retries can cause duplicates
-
How do you handle the "exactly-once" guarantee?
-
At-least-once delivery + idempotency = exactly-once semantics
-
What if Redis goes down?
-
Fail-open (allow with warning) or fail-closed (reject)
-
How do you handle very large response bodies?
-
Compress, store reference to blob storage, or just store key fields
-
What about database-level idempotency?
-
Unique constraints, but need application-level for full response caching
-
How long should keys be retained?
- 24-48 hours typical, balance between safety and storage cost