Multi Step Processes¶
The Problem¶
How do you handle operations that span multiple services, databases, or steps where: - Each step can fail independently - The entire process must succeed or be rolled back - Steps may take varying amounts of time - External services may be unreliable
Common Scenarios: - E-commerce order processing (payment → inventory → shipping) - User registration (create account → send email → setup defaults) - Travel booking (flight → hotel → car rental) - Money transfer between banks - Document approval workflows
Options & Trade-offs¶
1. Two-Phase Commit (2PC)¶
Philosophy: "All participants agree before committing"
Implementation:
public class TwoPhaseCommitCoordinator {
public boolean executeTransaction(List<Participant> participants,
TransactionData data) {
String txId = UUID.randomUUID().toString();
// Phase 1: Prepare
boolean allPrepared = true;
for (Participant p : participants) {
try {
if (!p.prepare(txId, data)) {
allPrepared = false;
break;
}
} catch (Exception e) {
allPrepared = false;
break;
}
}
// Phase 2: Commit or Rollback
if (allPrepared) {
for (Participant p : participants) {
p.commit(txId);
}
return true;
} else {
for (Participant p : participants) {
p.rollback(txId);
}
return false;
}
}
}
| Pros | Cons |
|---|---|
| Strong consistency (ACID) | Blocking - all wait for slowest |
| Well-understood protocol | Coordinator is SPOF |
| Atomic across participants | Doesn't scale well |
| Network partition = stuck transactions | |
| Latency = sum of all participants |
When to use: - Small number of participants - Same data center (low latency) - Strong consistency absolutely required - Traditional database transactions
2. Saga Pattern¶
Philosophy: "Execute steps in sequence, compensate on failure"
Two Saga Styles:
Choreography (Event-Driven)¶
Orchestration (Centralized)¶
Orchestration Implementation:
public class OrderSaga {
public enum State {
STARTED, PAYMENT_COMPLETED, INVENTORY_RESERVED,
SHIPPING_SCHEDULED, COMPLETED, FAILED
}
@Transactional
public void execute(Order order) {
SagaContext context = new SagaContext(order);
try {
// Step 1: Process Payment
PaymentResult payment = paymentService.charge(order);
context.setPaymentId(payment.getId());
context.setState(State.PAYMENT_COMPLETED);
saveSagaState(context);
// Step 2: Reserve Inventory
ReservationResult reservation = inventoryService.reserve(order);
context.setReservationId(reservation.getId());
context.setState(State.INVENTORY_RESERVED);
saveSagaState(context);
// Step 3: Schedule Shipping
ShipmentResult shipment = shippingService.schedule(order);
context.setShipmentId(shipment.getId());
context.setState(State.SHIPPING_SCHEDULED);
saveSagaState(context);
context.setState(State.COMPLETED);
saveSagaState(context);
} catch (Exception e) {
compensate(context);
throw new SagaFailedException(e);
}
}
private void compensate(SagaContext context) {
// Compensate in reverse order
switch (context.getState()) {
case SHIPPING_SCHEDULED:
shippingService.cancel(context.getShipmentId());
// fall through
case INVENTORY_RESERVED:
inventoryService.release(context.getReservationId());
// fall through
case PAYMENT_COMPLETED:
paymentService.refund(context.getPaymentId());
break;
}
context.setState(State.FAILED);
saveSagaState(context);
}
}
Saga State Machine:
public class SagaStateMachine {
private final Map<State, Map<Event, State>> transitions = Map.of(
State.STARTED, Map.of(
Event.PAYMENT_SUCCESS, State.PAYMENT_COMPLETED,
Event.PAYMENT_FAILED, State.COMPENSATING
),
State.PAYMENT_COMPLETED, Map.of(
Event.INVENTORY_RESERVED, State.INVENTORY_RESERVED,
Event.INVENTORY_FAILED, State.COMPENSATING
),
// ... more transitions
);
public State transition(State current, Event event) {
return transitions.get(current).get(event);
}
}
| Choreography | Orchestration |
|---|---|
| Loose coupling | Centralized logic |
| No SPOF | Easier to understand |
| Hard to track overall flow | Single point of failure |
| Complex debugging | Easier testing |
| Good for simple flows | Good for complex flows |
| Pros | Cons |
|---|---|
| No blocking | Eventual consistency only |
| Scales well | Compensations can fail |
| Resilient to failures | Complex error handling |
| Works across services | Harder to debug |
| Async friendly | Temporary inconsistency visible |
When to use: - Microservices architectures - Long-running transactions - Cross-service workflows - When strong consistency isn't required
3. Process Manager / Workflow Engine¶
Philosophy: "Use a dedicated system to manage complex workflows"
Tools: - Temporal (recommended) - Cadence (Uber) - AWS Step Functions - Camunda - Netflix Conductor
Temporal Example:
// Workflow Definition
@WorkflowInterface
public interface OrderWorkflow {
@WorkflowMethod
OrderResult processOrder(Order order);
}
@WorkflowImpl
public class OrderWorkflowImpl implements OrderWorkflow {
private final PaymentActivity paymentActivity =
Workflow.newActivityStub(PaymentActivity.class, activityOptions);
private final InventoryActivity inventoryActivity =
Workflow.newActivityStub(InventoryActivity.class, activityOptions);
@Override
public OrderResult processOrder(Order order) {
// Temporal handles retries, timeouts, state persistence
PaymentResult payment = paymentActivity.charge(order);
try {
InventoryResult inventory = inventoryActivity.reserve(order);
ShippingResult shipping = shippingActivity.schedule(order);
return OrderResult.success(payment, inventory, shipping);
} catch (Exception e) {
// Compensation
paymentActivity.refund(payment.getId());
throw e;
}
}
}
// Activity Definition
@ActivityInterface
public interface PaymentActivity {
PaymentResult charge(Order order);
void refund(String paymentId);
}
AWS Step Functions Example:
{
"StartAt": "ProcessPayment",
"States": {
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:payment-function",
"Next": "ReserveInventory",
"Catch": [{
"ErrorEquals": ["PaymentFailed"],
"Next": "PaymentFailedState"
}]
},
"ReserveInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:inventory-function",
"Next": "ScheduleShipping",
"Catch": [{
"ErrorEquals": ["InventoryFailed"],
"Next": "RefundPayment"
}]
},
"ScheduleShipping": {
"Type": "Task",
"Resource": "arn:aws:lambda:shipping-function",
"End": true
},
"RefundPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:refund-function",
"Next": "FailState"
},
"FailState": {
"Type": "Fail",
"Error": "OrderProcessingFailed"
}
}
}
| Pros | Cons |
|---|---|
| Built-in retries & timeouts | Additional infrastructure |
| Visual workflow monitoring | Learning curve |
| State persistence handled | Vendor lock-in possible |
| Easy to modify workflows | Cost (managed services) |
| Audit trail included |
When to use: - Complex multi-step workflows - Long-running processes (hours/days) - Need for visibility and monitoring - Human-in-the-loop processes
4. Event Sourcing + Process Manager¶
Philosophy: "Store all events, derive state, react to events"
Implementation:
public class OrderProcessManager {
@EventHandler
public void on(OrderCreatedEvent event) {
// Send command to payment service
commandGateway.send(new ProcessPaymentCommand(
event.getOrderId(),
event.getAmount()
));
}
@EventHandler
public void on(PaymentSucceededEvent event) {
// Send command to inventory service
commandGateway.send(new ReserveInventoryCommand(
event.getOrderId(),
event.getItems()
));
}
@EventHandler
public void on(InventoryReservedEvent event) {
commandGateway.send(new ScheduleShippingCommand(
event.getOrderId()
));
}
@EventHandler
public void on(PaymentFailedEvent event) {
// Mark order as failed
commandGateway.send(new FailOrderCommand(event.getOrderId()));
}
@EventHandler
public void on(InventoryReservationFailedEvent event) {
// Compensate: refund payment
commandGateway.send(new RefundPaymentCommand(event.getOrderId()));
}
}
| Pros | Cons |
|---|---|
| Complete audit history | Complex to implement |
| Replay/debug capability | Eventual consistency |
| Temporal queries | Storage grows |
| Natural fit for sagas | Learning curve |
When to use: - Audit requirements - Complex domain logic - Need to replay events - CQRS architecture
5. Outbox Pattern (Reliable Messaging)¶
Philosophy: "Guarantee message delivery alongside database transaction"
Implementation:
// Within same transaction
@Transactional
public void processOrder(Order order) {
// 1. Business logic
orderRepository.save(order);
// 2. Write to outbox (same transaction)
OutboxEvent event = new OutboxEvent(
UUID.randomUUID().toString(),
"OrderCreated",
JsonUtils.toJson(new OrderCreatedEvent(order)),
false
);
outboxRepository.save(event);
}
// Separate poller (CDC or polling)
@Scheduled(fixedDelay = 1000)
public void publishOutboxEvents() {
List<OutboxEvent> events = outboxRepository.findUnpublished();
for (OutboxEvent event : events) {
try {
messagePublisher.publish(event.getTopic(), event.getPayload());
event.setPublished(true);
outboxRepository.save(event);
} catch (Exception e) {
// Will retry next poll
log.error("Failed to publish event", e);
}
}
}
Using Debezium (CDC):
| Pros | Cons |
|---|---|
| Atomic with business logic | Additional table/polling |
| No distributed transaction | Slight delay |
| At-least-once delivery | Need deduplication |
| Simple to implement |
When to use: - Reliable event publishing - When message broker might be down - Avoiding 2PC between DB and broker
6. Transactional Inbox Pattern¶
Philosophy: "Deduplicate incoming messages reliably"
@Transactional
public void handleEvent(Event event) {
// Check if already processed
if (inboxRepository.existsById(event.getId())) {
log.info("Duplicate event, skipping: {}", event.getId());
return;
}
// Process the event
processBusinessLogic(event);
// Mark as processed (same transaction)
inboxRepository.save(new InboxEntry(event.getId(), Instant.now()));
}
Comparison Matrix¶
| Approach | Consistency | Complexity | Scalability | Use Case |
|---|---|---|---|---|
| 2PC | Strong | Medium | Poor | Same DC, few services |
| Saga (Choreography) | Eventual | High | Excellent | Simple flows, loose coupling |
| Saga (Orchestration) | Eventual | Medium | Good | Complex flows |
| Workflow Engine | Eventual | Low | Good | Long-running, visibility |
| Event Sourcing | Eventual | High | Excellent | Audit, complex domain |
| Outbox Pattern | At-least-once | Low | Good | Reliable messaging |
Decision Tree¶
Real-World Examples¶
E-Commerce Order (Saga with Orchestrator)¶
Travel Booking (Saga with Choreography)¶
Document Approval (Workflow Engine)¶
Key Takeaways¶
- Avoid 2PC in microservices - Use sagas instead
- Prefer orchestration for complex flows - Easier to understand and debug
- Use workflow engines for long-running processes - They handle the hard parts
- Always have compensation logic - Things will fail
- Outbox pattern for reliability - Atomic with business transaction
- Idempotency everywhere - Messages will be duplicated
- Monitor saga states - Know where processes are stuck