Payment Processing System¶
Problem Statement¶
Design a payment processing system that can handle credit card transactions at scale, similar to Stripe's core payment infrastructure. The system should process payments reliably, handle failures gracefully, and ensure exactly-once processing.
Requirements¶
Functional Requirements¶
- Process credit/debit card payments
- Support multiple payment methods (cards, bank transfers, wallets)
- Handle authorization, capture, and settlement
- Support refunds and chargebacks
- Multi-currency support
- Provide transaction status and history
- Generate receipts and notifications
Non-Functional Requirements¶
- Availability: 99.99% uptime (payment is critical)
- Latency: < 500ms for authorization
- Throughput: 10,000+ TPS
- Consistency: Exactly-once processing (no duplicate charges)
- Security: PCI-DSS compliant
- Auditability: Complete audit trail
High-Level Architecture¶
Core Components¶
1. API Gateway¶
- Authentication (API keys, OAuth)
- Rate limiting per merchant
- Request validation
- TLS termination
- Request/response logging
2. Payment Service¶
- Orchestrates payment flow
- Validates payment request
- Checks idempotency
- Routes to appropriate processor
- Handles retries and failures
3. Idempotency Store¶
- Stores idempotency keys with request hash
- TTL-based expiration (24-48 hours)
- Prevents duplicate charges on retries
- Returns cached response for duplicate requests
4. Payment Router¶
- Routes to appropriate payment processor
- Handles processor failover
- Load balances across processors
- Applies routing rules (cost optimization, success rate)
5. Payment Processors (Adapters)¶
- Abstracts different card networks (Visa, MasterCard, Amex)
- Handles protocol differences
- Manages credentials and encryption
- Implements circuit breakers
6. Payment Ledger¶
- Double-entry accounting
- Immutable transaction log
- Supports audit queries
- Handles currency conversion
7. Settlement Service¶
- Batches transactions for settlement
- Reconciles with processor reports
- Handles disputes and chargebacks
- Initiates fund transfers
Payment Flow¶
Authorization Flow¶
Payment States¶
Data Models¶
Payment (Primary Record)¶
CREATE TABLE payments (
id UUID PRIMARY KEY,
merchant_id UUID NOT NULL,
idempotency_key VARCHAR(255),
-- Amount
amount BIGINT NOT NULL, -- In smallest currency unit (cents)
currency VARCHAR(3) NOT NULL, -- ISO 4217
-- Payment method
payment_method_id UUID,
payment_method_type VARCHAR(50), -- card, bank_transfer, wallet
-- Card details (tokenized)
card_last_four VARCHAR(4),
card_brand VARCHAR(20),
-- Status
status VARCHAR(20) NOT NULL, -- pending, authorized, captured, failed
failure_code VARCHAR(50),
failure_message TEXT,
-- Processor
processor VARCHAR(50),
processor_tx_id VARCHAR(255),
-- Timestamps
created_at TIMESTAMP NOT NULL,
authorized_at TIMESTAMP,
captured_at TIMESTAMP,
settled_at TIMESTAMP,
-- Metadata
description TEXT,
metadata JSONB,
UNIQUE(merchant_id, idempotency_key)
);
CREATE INDEX idx_payments_merchant ON payments(merchant_id, created_at DESC);
CREATE INDEX idx_payments_status ON payments(status, created_at);
Ledger Entry (Double-Entry)¶
CREATE TABLE ledger_entries (
id UUID PRIMARY KEY,
payment_id UUID NOT NULL,
entry_type VARCHAR(50) NOT NULL, -- authorization, capture, refund, settlement
-- Double entry
debit_account VARCHAR(100) NOT NULL,
credit_account VARCHAR(100) NOT NULL,
amount BIGINT NOT NULL,
currency VARCHAR(3) NOT NULL,
-- Audit
created_at TIMESTAMP NOT NULL,
created_by VARCHAR(100),
FOREIGN KEY (payment_id) REFERENCES payments(id)
);
Idempotency Record¶
-- Redis structure (preferred for performance)
-- Key: idempotency:{merchant_id}:{idempotency_key}
-- Value: {
-- "request_hash": "sha256...",
-- "status": "pending|completed",
-- "payment_id": "uuid",
-- "response": { ... },
-- "created_at": "timestamp"
-- }
-- TTL: 24-48 hours
Key Design Decisions¶
1. Idempotency¶
2. Two-Phase Payment (Auth + Capture)¶
Why separate authorization and capture?
Authorization (immediate):
- Validates card
- Checks funds availability
- Places hold on funds
- No money moves yet
Capture (can be delayed):
- Actually transfers funds
- Can be partial capture
- Can auto-expire if not captured
Use cases:
- E-commerce: Auth at checkout, capture at shipment
- Hotels: Auth at booking, capture at checkout
- Gas stations: Auth for max, capture actual amount
3. Processor Failover¶
public class PaymentRouter {
private List<Processor> processors; // Ordered by preference
public AuthorizationResult authorize(PaymentRequest request) {
for (Processor processor : processors) {
if (!processor.isHealthy()) {
continue; // Circuit breaker open
}
try {
return processor.authorize(request);
} catch (ProcessorUnavailableException e) {
// Try next processor
continue;
} catch (PaymentDeclinedException e) {
// Don't retry declines
throw e;
}
}
throw new AllProcessorsUnavailableException();
}
}
4. PCI Compliance¶
Handling Failures¶
Retry Strategy¶
public class PaymentRetryPolicy {
// Retryable errors (network issues, timeouts)
private static final Set<String> RETRYABLE = Set.of(
"NETWORK_ERROR",
"TIMEOUT",
"PROCESSOR_UNAVAILABLE",
"RATE_LIMITED"
);
// Non-retryable errors (business logic)
private static final Set<String> NON_RETRYABLE = Set.of(
"CARD_DECLINED",
"INSUFFICIENT_FUNDS",
"INVALID_CARD",
"FRAUD_DETECTED"
);
public boolean shouldRetry(PaymentError error, int attempt) {
if (NON_RETRYABLE.contains(error.getCode())) {
return false;
}
return RETRYABLE.contains(error.getCode()) && attempt < MAX_RETRIES;
}
}
Timeout Handling¶
Scalability Considerations¶
Horizontal Scaling¶
Database Sharding¶
Shard by merchant_id:
• Queries are typically scoped to a merchant
• Even distribution (large merchants may need dedicated shards)
• Cross-merchant queries go to analytics/reporting DB
Shard 0: merchant_id % 4 == 0
Shard 1: merchant_id % 4 == 1
Shard 2: merchant_id % 4 == 2
Shard 3: merchant_id % 4 == 3
Security Considerations¶
Monitoring & Alerting¶
Key Metrics¶
| Metric | Target | Alert Threshold |
|---|---|---|
| Authorization success rate | > 95% | < 90% |
| P99 latency | < 500ms | > 1s |
| Processor availability | > 99.9% | < 99% |
| Error rate | < 0.1% | > 1% |
| Duplicate charge rate | 0% | > 0 |
Dashboards¶
Technology Choices¶
| Component | Technology Options |
|---|---|
| API Gateway | Kong, AWS API Gateway, Custom |
| Payment Service | Java/Kotlin, Go |
| Idempotency Store | Redis Cluster |
| Primary Database | PostgreSQL, CockroachDB |
| Message Queue | Kafka, SQS |
| Caching | Redis, Memcached |
| Monitoring | Datadog, Prometheus + Grafana |
| Secrets | HashiCorp Vault, AWS Secrets Manager |
Interview Discussion Points¶
- How do you ensure exactly-once processing?
-
Idempotency keys + atomic operations
-
How do you handle processor timeouts?
-
Query before retry, use unique reference IDs
-
How do you scale to handle Black Friday traffic?
-
Pre-scale, auto-scaling, graceful degradation
-
How do you handle multi-currency?
-
Store in original currency, convert at settlement
-
What happens if your database goes down?
-
Read replicas, multi-region, queue writes
-
How do you prevent fraud?
- ML models, velocity checks, 3DS, manual review