Multi Step Processes¶

The Problem¶

How do you handle operations that span multiple services, databases, or steps where: - Each step can fail independently - The entire process must succeed or be rolled back - Steps may take varying amounts of time - External services may be unreliable

Common Scenarios: - E-commerce order processing (payment → inventory → shipping) - User registration (create account → send email → setup defaults) - Travel booking (flight → hotel → car rental) - Money transfer between banks - Document approval workflows

Options & Trade-offs¶

1. Two-Phase Commit (2PC)¶

Philosophy: "All participants agree before committing"

Two-Phase Commit

Implementation:

href="#__codelineno-0-1">public class TwoPhaseCommitCoordinator { public boolean executeTransaction(List<Participant> participants, TransactionData data) { String txId = UUID.randomUUID().toString(); // Phase 1: Prepare boolean allPrepared = true; for (Participant p : participants) { try { if (!p.prepare(txId, data)) { allPrepared = false; break; } } catch (Exception e) { allPrepared = false; break; } } // Phase 2: Commit or Rollback if (allPrepared) { for (Participant p : participants) { p.commit(txId); } return true; } else { for (Participant p : participants) { p.rollback(txId); } return false; } } }

Pros	Cons
Strong consistency (ACID)	Blocking - all wait for slowest
Well-understood protocol	Coordinator is SPOF
Atomic across participants	Doesn't scale well
	Network partition = stuck transactions
	Latency = sum of all participants

When to use: - Small number of participants - Same data center (low latency) - Strong consistency absolutely required - Traditional database transactions

2. Saga Pattern¶

Philosophy: "Execute steps in sequence, compensate on failure"

Saga Forward and Compensation Paths

Two Saga Styles:

Choreography (Event-Driven)¶

Saga Choreography

Orchestration (Centralized)¶

Saga Orchestration

Orchestration Implementation:

public class OrderSaga {

    public enum State {
        STARTED, PAYMENT_COMPLETED, INVENTORY_RESERVED,
        SHIPPING_SCHEDULED, COMPLETED, FAILED
    }

    @Transactional
    public void execute(Order order) {
        SagaContext context = new SagaContext(order);

        try {
            // Step 1: Process Payment
            PaymentResult payment = paymentService.charge(order);
            context.setPaymentId(payment.getId());
            context.setState(State.PAYMENT_COMPLETED);
            saveSagaState(context);

            // Step 2: Reserve Inventory
            ReservationResult reservation = inventoryService.reserve(order);
            context.setReservationId(reservation.getId());
            context.setState(State.INVENTORY_RESERVED);
            saveSagaState(context);

            // Step 3: Schedule Shipping
            ShipmentResult shipment = shippingService.schedule(order);
            context.setShipmentId(shipment.getId());
            context.setState(State.SHIPPING_SCHEDULED);
            saveSagaState(context);

            context.setState(State.COMPLETED);
            saveSagaState(context);

        } catch (Exception e) {
            compensate(context);
            throw new SagaFailedException(e);
        }
    }

    private void compensate(SagaContext context) {
        // Compensate in reverse order
        switch (context.getState()) {
            case SHIPPING_SCHEDULED:
                shippingService.cancel(context.getShipmentId());
                // fall through
            case INVENTORY_RESERVED:
                inventoryService.release(context.getReservationId());
                // fall through
            case PAYMENT_COMPLETED:
                paymentService.refund(context.getPaymentId());
                break;
        }
        context.setState(State.FAILED);
        saveSagaState(context);
    }
}

Saga State Machine:

public class SagaStateMachine {
    private final Map<State, Map<Event, State>> transitions = Map.of(
        State.STARTED, Map.of(
            Event.PAYMENT_SUCCESS, State.PAYMENT_COMPLETED,
            Event.PAYMENT_FAILED, State.COMPENSATING
        ),
        State.PAYMENT_COMPLETED, Map.of(
            Event.INVENTORY_RESERVED, State.INVENTORY_RESERVED,
            Event.INVENTORY_FAILED, State.COMPENSATING
        ),
        // ... more transitions
    );

    public State transition(State current, Event event) {
        return transitions.get(current).get(event);
    }
}

Choreography	Orchestration
Loose coupling	Centralized logic
No SPOF	Easier to understand
Hard to track overall flow	Single point of failure
Complex debugging	Easier testing
Good for simple flows	Good for complex flows

Pros	Cons
No blocking	Eventual consistency only
Scales well	Compensations can fail
Resilient to failures	Complex error handling
Works across services	Harder to debug
Async friendly	Temporary inconsistency visible

When to use: - Microservices architectures - Long-running transactions - Cross-service workflows - When strong consistency isn't required

3. Process Manager / Workflow Engine¶

Philosophy: "Use a dedicated system to manage complex workflows"

Tools: - Temporal (recommended) - Cadence (Uber) - AWS Step Functions - Camunda - Netflix Conductor

Temporal Example:

// Workflow Definition
@WorkflowInterface
public interface OrderWorkflow {
    @WorkflowMethod
    OrderResult processOrder(Order order);
}

@WorkflowImpl
public class OrderWorkflowImpl implements OrderWorkflow {

    private final PaymentActivity paymentActivity =
        Workflow.newActivityStub(PaymentActivity.class, activityOptions);
    private final InventoryActivity inventoryActivity =
        Workflow.newActivityStub(InventoryActivity.class, activityOptions);

    @Override
    public OrderResult processOrder(Order order) {
        // Temporal handles retries, timeouts, state persistence

        PaymentResult payment = paymentActivity.charge(order);

        try {
            InventoryResult inventory = inventoryActivity.reserve(order);
            ShippingResult shipping = shippingActivity.schedule(order);
            return OrderResult.success(payment, inventory, shipping);

        } catch (Exception e) {
            // Compensation
            paymentActivity.refund(payment.getId());
            throw e;
        }
    }
}

// Activity Definition
@ActivityInterface
public interface PaymentActivity {
    PaymentResult charge(Order order);
    void refund(String paymentId);
}

AWS Step Functions Example:

{
  "StartAt": "ProcessPayment",
  "States": {
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:payment-function",
      "Next": "ReserveInventory",
      "Catch": [{
        "ErrorEquals": ["PaymentFailed"],
        "Next": "PaymentFailedState"
      }]
    },
    "ReserveInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:inventory-function",
      "Next": "ScheduleShipping",
      "Catch": [{
        "ErrorEquals": ["InventoryFailed"],
        "Next": "RefundPayment"
      }]
    },
    "ScheduleShipping": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:shipping-function",
      "End": true
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:refund-function",
      "Next": "FailState"
    },
    "FailState": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed"
    }
  }
}

Pros	Cons
Built-in retries & timeouts	Additional infrastructure
Visual workflow monitoring	Learning curve
State persistence handled	Vendor lock-in possible
Easy to modify workflows	Cost (managed services)
Audit trail included

When to use: - Complex multi-step workflows - Long-running processes (hours/days) - Need for visibility and monitoring - Human-in-the-loop processes

4. Event Sourcing + Process Manager¶

Philosophy: "Store all events, derive state, react to events"

Event Sourcing + Process Manager

Implementation:

public class OrderProcessManager {

    @EventHandler
    public void on(OrderCreatedEvent event) {
        // Send command to payment service
        commandGateway.send(new ProcessPaymentCommand(
            event.getOrderId(),
            event.getAmount()
        ));
    }

    @EventHandler
    public void on(PaymentSucceededEvent event) {
        // Send command to inventory service
        commandGateway.send(new ReserveInventoryCommand(
            event.getOrderId(),
            event.getItems()
        ));
    }

    @EventHandler
    public void on(InventoryReservedEvent event) {
        commandGateway.send(new ScheduleShippingCommand(
            event.getOrderId()
        ));
    }

    @EventHandler
    public void on(PaymentFailedEvent event) {
        // Mark order as failed
        commandGateway.send(new FailOrderCommand(event.getOrderId()));
    }

    @EventHandler
    public void on(InventoryReservationFailedEvent event) {
        // Compensate: refund payment
        commandGateway.send(new RefundPaymentCommand(event.getOrderId()));
    }
}

Pros	Cons
Complete audit history	Complex to implement
Replay/debug capability	Eventual consistency
Temporal queries	Storage grows
Natural fit for sagas	Learning curve

When to use: - Audit requirements - Complex domain logic - Need to replay events - CQRS architecture

5. Outbox Pattern (Reliable Messaging)¶

Philosophy: "Guarantee message delivery alongside database transaction"

Outbox Pattern

Implementation:

// Within same transaction
@Transactional
public void processOrder(Order order) {
    // 1. Business logic
    orderRepository.save(order);

    // 2. Write to outbox (same transaction)
    OutboxEvent event = new OutboxEvent(
        UUID.randomUUID().toString(),
        "OrderCreated",
        JsonUtils.toJson(new OrderCreatedEvent(order)),
        false
    );
    outboxRepository.save(event);
}

// Separate poller (CDC or polling)
@Scheduled(fixedDelay = 1000)
public void publishOutboxEvents() {
    List<OutboxEvent> events = outboxRepository.findUnpublished();
    for (OutboxEvent event : events) {
        try {
            messagePublisher.publish(event.getTopic(), event.getPayload());
            event.setPublished(true);
            outboxRepository.save(event);
        } catch (Exception e) {
            // Will retry next poll
            log.error("Failed to publish event", e);
        }
    }
}

Using Debezium (CDC): Debezium CDC

Pros	Cons
Atomic with business logic	Additional table/polling
No distributed transaction	Slight delay
At-least-once delivery	Need deduplication
Simple to implement

When to use: - Reliable event publishing - When message broker might be down - Avoiding 2PC between DB and broker

6. Transactional Inbox Pattern¶

Philosophy: "Deduplicate incoming messages reliably"

@Transactional
public void handleEvent(Event event) {
    // Check if already processed
    if (inboxRepository.existsById(event.getId())) {
        log.info("Duplicate event, skipping: {}", event.getId());
        return;
    }

    // Process the event
    processBusinessLogic(event);

    // Mark as processed (same transaction)
    inboxRepository.save(new InboxEntry(event.getId(), Instant.now()));
}

Comparison Matrix¶

Approach	Consistency	Complexity	Scalability	Use Case
2PC	Strong	Medium	Poor	Same DC, few services
Saga (Choreography)	Eventual	High	Excellent	Simple flows, loose coupling
Saga (Orchestration)	Eventual	Medium	Good	Complex flows
Workflow Engine	Eventual	Low	Good	Long-running, visibility
Event Sourcing	Eventual	High	Excellent	Audit, complex domain
Outbox Pattern	At-least-once	Low	Good	Reliable messaging

Decision Tree¶

Decision Tree

Real-World Examples¶

E-Commerce Order (Saga with Orchestrator)¶

E-Commerce Order Saga

Travel Booking (Saga with Choreography)¶

Travel Booking Choreography

Document Approval (Workflow Engine)¶

Document Approval Workflow

Key Takeaways¶

Avoid 2PC in microservices - Use sagas instead
Prefer orchestration for complex flows - Easier to understand and debug
Use workflow engines for long-running processes - They handle the hard parts
Always have compensation logic - Things will fail
Outbox pattern for reliability - Atomic with business transaction
Idempotency everywhere - Messages will be duplicated
Monitor saga states - Know where processes are stuck