Skip to content

Idempotency Key System

Problem Statement

Design an idempotency key system that prevents duplicate operations in a distributed payment system. The system should ensure that retrying a request (due to network failures, timeouts, or client errors) produces the same result without creating duplicate charges or side effects.


Requirements

Functional Requirements

  • Accept idempotency key with API requests
  • Return cached response for duplicate requests
  • Detect request hash mismatches (same key, different request)
  • Support configurable TTL for keys
  • Handle concurrent requests with same key
  • Provide idempotency status in response

Non-Functional Requirements

  • Latency: < 5ms overhead for idempotency check
  • Consistency: Strong consistency for duplicate detection
  • Availability: 99.99% (in critical payment path)
  • Durability: Keys must persist for TTL duration
  • Scalability: Handle millions of keys

High-Level Architecture

Idempotency Key System - High-Level Architecture


Core Concepts

What is Idempotency?

Idempotency Explained


Data Model

Idempotency Record

CREATE TABLE idempotency_keys (
    -- Composite primary key
    api_key_id          UUID NOT NULL,
    idempotency_key     VARCHAR(255) NOT NULL,

    -- Request fingerprint
    request_path        VARCHAR(255) NOT NULL,
    request_method      VARCHAR(10) NOT NULL,
    request_hash        VARCHAR(64) NOT NULL,       -- SHA-256 of request body

    -- Status
    status              VARCHAR(20) NOT NULL,       -- pending, completed, failed

    -- Response (stored after completion)
    response_code       INT,
    response_body       TEXT,
    response_headers    JSONB,

    -- Locking
    locked_at           TIMESTAMP,
    lock_expires_at     TIMESTAMP,

    -- Timing
    created_at          TIMESTAMP NOT NULL,
    updated_at          TIMESTAMP NOT NULL,
    expires_at          TIMESTAMP NOT NULL,

    PRIMARY KEY (api_key_id, idempotency_key)
);

CREATE INDEX idx_idempotency_expires ON idempotency_keys(expires_at);

Redis Structure (Alternative)

Key: idempotency:{api_key_id}:{idempotency_key}

Value (JSON):
{
    "status": "pending|completed|failed",
    "request_hash": "sha256...",
    "request_path": "/v1/charges",
    "response": {
        "code": 200,
        "body": "{...}",
        "headers": {...}
    },
    "locked_until": 1640000000,
    "created_at": 1639999900,
    "expires_at": 1640086300
}

TTL: Set to expires_at

Request Flow

State Machine

Idempotency Key States

Request Processing Flow

Request Processing Flow


Implementation

Middleware Implementation

@Component
public class IdempotencyMiddleware {

    private final IdempotencyStore store;
    private final Duration lockTimeout = Duration.ofSeconds(60);
    private final Duration keyTtl = Duration.ofHours(24);

    public Object processRequest(
            String apiKeyId,
            String idempotencyKey,
            String requestPath,
            String requestBody,
            Supplier<Object> businessLogic
    ) {
        // 1. Generate request hash
        String requestHash = hashRequest(requestPath, requestBody);
        String compositeKey = apiKeyId + ":" + idempotencyKey;

        // 2. Try to acquire or find existing record
        IdempotencyRecord record = store.findOrCreate(
            compositeKey,
            requestHash,
            requestPath,
            lockTimeout,
            keyTtl
        );

        // 3. Handle based on record status
        switch (record.getStatus()) {
            case COMPLETED:
                // Validate request hash matches
                if (!record.getRequestHash().equals(requestHash)) {
                    throw new IdempotencyKeyConflictException(
                        "Idempotency key already used with different request"
                    );
                }
                // Return cached response
                return record.getResponse();

            case FAILED:
                // Return cached error response
                return record.getResponse();

            case PENDING:
                if (record.isLockedByOther()) {
                    // Another request is processing
                    throw new IdempotencyKeyInProgressException(
                        "Request with this idempotency key is already in progress"
                    );
                }
                // We acquired the lock, proceed with business logic
                break;
        }

        // 4. Execute business logic
        try {
            Object response = businessLogic.get();

            // 5. Store successful response
            store.complete(compositeKey, response);

            return response;

        } catch (Exception e) {
            // 6. Store failure response
            store.fail(compositeKey, e);
            throw e;
        }
    }

    private String hashRequest(String path, String body) {
        String content = path + "|" + body;
        return DigestUtils.sha256Hex(content);
    }
}

Redis Implementation with Lua Script

-- KEYS[1]: idempotency key
-- ARGV[1]: request hash
-- ARGV[2]: request path
-- ARGV[3]: lock timeout (seconds)
-- ARGV[4]: key TTL (seconds)
-- ARGV[5]: current timestamp
-- ARGV[6]: lock token (unique per request)

local key = KEYS[1]
local requestHash = ARGV[1]
local requestPath = ARGV[2]
local lockTimeout = tonumber(ARGV[3])
local keyTtl = tonumber(ARGV[4])
local now = tonumber(ARGV[5])
local lockToken = ARGV[6]

-- Try to get existing record
local existing = redis.call("GET", key)

if existing then
    local record = cjson.decode(existing)

    -- Check if completed or failed
    if record.status == "completed" or record.status == "failed" then
        -- Validate request hash
        if record.request_hash ~= requestHash then
            return cjson.encode({
                status = "hash_mismatch",
                error = "Idempotency key used with different request"
            })
        end
        return cjson.encode({
            status = record.status,
            response = record.response
        })
    end

    -- Check if lock is still valid
    if record.locked_until and record.locked_until > now then
        return cjson.encode({
            status = "locked",
            locked_until = record.locked_until
        })
    end

    -- Lock expired, try to acquire
end

-- Create or update with new lock
local record = {
    status = "pending",
    request_hash = requestHash,
    request_path = requestPath,
    lock_token = lockToken,
    locked_until = now + lockTimeout,
    created_at = now
}

redis.call("SET", key, cjson.encode(record), "EX", keyTtl)

return cjson.encode({
    status = "acquired",
    lock_token = lockToken
})

Complete Response Storage

public void complete(String key, Object response) {
    String script = """
        local key = KEYS[1]
        local lockToken = ARGV[1]
        local responseJson = ARGV[2]
        local ttl = tonumber(ARGV[3])

        local existing = redis.call("GET", key)
        if not existing then
            return {err = "Key not found"}
        end

        local record = cjson.decode(existing)

        -- Verify we hold the lock
        if record.lock_token ~= lockToken then
            return {err = "Lock not held"}
        end

        -- Update to completed
        record.status = "completed"
        record.response = cjson.decode(responseJson)
        record.completed_at = tonumber(ARGV[4])
        record.lock_token = nil
        record.locked_until = nil

        redis.call("SET", key, cjson.encode(record), "EX", ttl)

        return {ok = "completed"}
    """;

    redis.eval(script, List.of(key),
        lockToken,
        objectMapper.writeValueAsString(response),
        keyTtl.getSeconds(),
        Instant.now().getEpochSecond()
    );
}

Handling Edge Cases

1. Request Hash Mismatch

// Same idempotency key, different request body
if (!record.getRequestHash().equals(requestHash)) {
    throw new IdempotencyKeyConflictException(
        "Idempotency key '" + idempotencyKey + "' has already been used " +
        "with a different request. Please use a new idempotency key."
    );
}

// HTTP Response:
// 422 Unprocessable Entity
{
    "error": {
        "type": "idempotency_error",
        "code": "key_in_use",
        "message": "This idempotency key has been used with different request parameters."
    }
}

2. Concurrent Requests

// Two requests arrive simultaneously with same key
// Request A acquires lock, Request B sees "PENDING"

// Response for Request B:
// 409 Conflict
{
    "error": {
        "type": "idempotency_error",
        "code": "in_progress",
        "message": "A request with this idempotency key is currently being processed.",
        "retry_after": 5
    }
}

// Client should retry after delay

3. Server Crash During Processing

Crash Recovery

4. Partial Failures

// Business logic has multiple side effects
@Transactional
public PaymentResult processPayment(PaymentRequest request) {
    // Step 1: Charge card
    ChargeResult charge = paymentProcessor.charge(request);

    // Step 2: Update order status
    orderService.markPaid(request.getOrderId());

    // Step 3: Send confirmation email
    emailService.sendReceipt(request.getCustomerId(), charge);

    return new PaymentResult(charge);
}

// Problem: What if Step 3 fails?
// Options:
// 1. Store response after Step 1 (card charged)
// 2. Make Step 3 async (decouple from main flow)
// 3. Use saga pattern for rollback

API Design

Request Headers

POST /v1/charges HTTP/1.1
Host: api.stripe.com
Authorization: Bearer sk_test_xxx
Idempotency-Key: order_12345_charge_attempt_1
Content-Type: application/json

{
    "amount": 2000,
    "currency": "usd",
    "source": "tok_visa"
}

Response Headers

HTTP/1.1 200 OK
Idempotency-Key: order_12345_charge_attempt_1
Idempotent-Replayed: true
Original-Request-Id: req_abc123

{
    "id": "ch_xxx",
    "amount": 2000,
    "status": "succeeded"
}

Error Responses

# Key conflict (different request)
HTTP/1.1 422 Unprocessable Entity
{
    "error": {
        "type": "idempotency_error",
        "code": "key_in_use",
        "message": "This idempotency key was used with a different request."
    }
}

# Request in progress
HTTP/1.1 409 Conflict
Retry-After: 5
{
    "error": {
        "type": "idempotency_error",
        "code": "in_progress",
        "message": "Request is already being processed."
    }
}

Scalability Considerations

Redis Cluster

Scaling Strategy

Cleanup Job

@Scheduled(cron = "0 0 * * * *")  // Every hour
public void cleanupExpiredKeys() {
    // For database implementation
    int deleted = jdbcTemplate.update(
        "DELETE FROM idempotency_keys WHERE expires_at < ?",
        Instant.now()
    );
    log.info("Cleaned up {} expired idempotency keys", deleted);
}

// For Redis: TTL handles cleanup automatically

Monitoring

Key Metrics

Metric Description Alert Threshold
Cache hit rate % of requests with existing key Informational
Conflict rate % of hash mismatches > 1%
Lock contention % of "in progress" responses > 5%
Redis latency P99 for idempotency check > 5ms

Technology Choices

Component Technology Options
Primary Store Redis (recommended), DynamoDB
Backup Store PostgreSQL (for compliance)
Hashing SHA-256

Interview Discussion Points

  1. Why is idempotency important for payments?
  2. Network failures, timeouts, client retries can cause duplicates

  3. How do you handle the "exactly-once" guarantee?

  4. At-least-once delivery + idempotency = exactly-once semantics

  5. What if Redis goes down?

  6. Fail-open (allow with warning) or fail-closed (reject)

  7. How do you handle very large response bodies?

  8. Compress, store reference to blob storage, or just store key fields

  9. What about database-level idempotency?

  10. Unique constraints, but need application-level for full response caching

  11. How long should keys be retained?

  12. 24-48 hours typical, balance between safety and storage cost