Skip to content

Multi Step Processes

The Problem

How do you handle operations that span multiple services, databases, or steps where: - Each step can fail independently - The entire process must succeed or be rolled back - Steps may take varying amounts of time - External services may be unreliable

Common Scenarios: - E-commerce order processing (payment → inventory → shipping) - User registration (create account → send email → setup defaults) - Travel booking (flight → hotel → car rental) - Money transfer between banks - Document approval workflows


Options & Trade-offs

1. Two-Phase Commit (2PC)

Philosophy: "All participants agree before committing"

Two-Phase Commit

Implementation:

public class TwoPhaseCommitCoordinator {

    public boolean executeTransaction(List<Participant> participants,
                                       TransactionData data) {
        String txId = UUID.randomUUID().toString();

        // Phase 1: Prepare
        boolean allPrepared = true;
        for (Participant p : participants) {
            try {
                if (!p.prepare(txId, data)) {
                    allPrepared = false;
                    break;
                }
            } catch (Exception e) {
                allPrepared = false;
                break;
            }
        }

        // Phase 2: Commit or Rollback
        if (allPrepared) {
            for (Participant p : participants) {
                p.commit(txId);
            }
            return true;
        } else {
            for (Participant p : participants) {
                p.rollback(txId);
            }
            return false;
        }
    }
}

Pros Cons
Strong consistency (ACID) Blocking - all wait for slowest
Well-understood protocol Coordinator is SPOF
Atomic across participants Doesn't scale well
Network partition = stuck transactions
Latency = sum of all participants

When to use: - Small number of participants - Same data center (low latency) - Strong consistency absolutely required - Traditional database transactions


2. Saga Pattern

Philosophy: "Execute steps in sequence, compensate on failure"

Saga Forward and Compensation Paths

Two Saga Styles:

Choreography (Event-Driven)

Saga Choreography

Orchestration (Centralized)

Saga Orchestration

Orchestration Implementation:

public class OrderSaga {

    public enum State {
        STARTED, PAYMENT_COMPLETED, INVENTORY_RESERVED,
        SHIPPING_SCHEDULED, COMPLETED, FAILED
    }

    @Transactional
    public void execute(Order order) {
        SagaContext context = new SagaContext(order);

        try {
            // Step 1: Process Payment
            PaymentResult payment = paymentService.charge(order);
            context.setPaymentId(payment.getId());
            context.setState(State.PAYMENT_COMPLETED);
            saveSagaState(context);

            // Step 2: Reserve Inventory
            ReservationResult reservation = inventoryService.reserve(order);
            context.setReservationId(reservation.getId());
            context.setState(State.INVENTORY_RESERVED);
            saveSagaState(context);

            // Step 3: Schedule Shipping
            ShipmentResult shipment = shippingService.schedule(order);
            context.setShipmentId(shipment.getId());
            context.setState(State.SHIPPING_SCHEDULED);
            saveSagaState(context);

            context.setState(State.COMPLETED);
            saveSagaState(context);

        } catch (Exception e) {
            compensate(context);
            throw new SagaFailedException(e);
        }
    }

    private void compensate(SagaContext context) {
        // Compensate in reverse order
        switch (context.getState()) {
            case SHIPPING_SCHEDULED:
                shippingService.cancel(context.getShipmentId());
                // fall through
            case INVENTORY_RESERVED:
                inventoryService.release(context.getReservationId());
                // fall through
            case PAYMENT_COMPLETED:
                paymentService.refund(context.getPaymentId());
                break;
        }
        context.setState(State.FAILED);
        saveSagaState(context);
    }
}

Saga State Machine:

public class SagaStateMachine {
    private final Map<State, Map<Event, State>> transitions = Map.of(
        State.STARTED, Map.of(
            Event.PAYMENT_SUCCESS, State.PAYMENT_COMPLETED,
            Event.PAYMENT_FAILED, State.COMPENSATING
        ),
        State.PAYMENT_COMPLETED, Map.of(
            Event.INVENTORY_RESERVED, State.INVENTORY_RESERVED,
            Event.INVENTORY_FAILED, State.COMPENSATING
        ),
        // ... more transitions
    );

    public State transition(State current, Event event) {
        return transitions.get(current).get(event);
    }
}

Choreography Orchestration
Loose coupling Centralized logic
No SPOF Easier to understand
Hard to track overall flow Single point of failure
Complex debugging Easier testing
Good for simple flows Good for complex flows
Pros Cons
No blocking Eventual consistency only
Scales well Compensations can fail
Resilient to failures Complex error handling
Works across services Harder to debug
Async friendly Temporary inconsistency visible

When to use: - Microservices architectures - Long-running transactions - Cross-service workflows - When strong consistency isn't required


3. Process Manager / Workflow Engine

Philosophy: "Use a dedicated system to manage complex workflows"

Tools: - Temporal (recommended) - Cadence (Uber) - AWS Step Functions - Camunda - Netflix Conductor

Temporal Example:

// Workflow Definition
@WorkflowInterface
public interface OrderWorkflow {
    @WorkflowMethod
    OrderResult processOrder(Order order);
}

@WorkflowImpl
public class OrderWorkflowImpl implements OrderWorkflow {

    private final PaymentActivity paymentActivity =
        Workflow.newActivityStub(PaymentActivity.class, activityOptions);
    private final InventoryActivity inventoryActivity =
        Workflow.newActivityStub(InventoryActivity.class, activityOptions);

    @Override
    public OrderResult processOrder(Order order) {
        // Temporal handles retries, timeouts, state persistence

        PaymentResult payment = paymentActivity.charge(order);

        try {
            InventoryResult inventory = inventoryActivity.reserve(order);
            ShippingResult shipping = shippingActivity.schedule(order);
            return OrderResult.success(payment, inventory, shipping);

        } catch (Exception e) {
            // Compensation
            paymentActivity.refund(payment.getId());
            throw e;
        }
    }
}

// Activity Definition
@ActivityInterface
public interface PaymentActivity {
    PaymentResult charge(Order order);
    void refund(String paymentId);
}

AWS Step Functions Example:

{
  "StartAt": "ProcessPayment",
  "States": {
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:payment-function",
      "Next": "ReserveInventory",
      "Catch": [{
        "ErrorEquals": ["PaymentFailed"],
        "Next": "PaymentFailedState"
      }]
    },
    "ReserveInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:inventory-function",
      "Next": "ScheduleShipping",
      "Catch": [{
        "ErrorEquals": ["InventoryFailed"],
        "Next": "RefundPayment"
      }]
    },
    "ScheduleShipping": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:shipping-function",
      "End": true
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:refund-function",
      "Next": "FailState"
    },
    "FailState": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed"
    }
  }
}

Pros Cons
Built-in retries & timeouts Additional infrastructure
Visual workflow monitoring Learning curve
State persistence handled Vendor lock-in possible
Easy to modify workflows Cost (managed services)
Audit trail included

When to use: - Complex multi-step workflows - Long-running processes (hours/days) - Need for visibility and monitoring - Human-in-the-loop processes


4. Event Sourcing + Process Manager

Philosophy: "Store all events, derive state, react to events"

Event Sourcing + Process Manager

Implementation:

public class OrderProcessManager {

    @EventHandler
    public void on(OrderCreatedEvent event) {
        // Send command to payment service
        commandGateway.send(new ProcessPaymentCommand(
            event.getOrderId(),
            event.getAmount()
        ));
    }

    @EventHandler
    public void on(PaymentSucceededEvent event) {
        // Send command to inventory service
        commandGateway.send(new ReserveInventoryCommand(
            event.getOrderId(),
            event.getItems()
        ));
    }

    @EventHandler
    public void on(InventoryReservedEvent event) {
        commandGateway.send(new ScheduleShippingCommand(
            event.getOrderId()
        ));
    }

    @EventHandler
    public void on(PaymentFailedEvent event) {
        // Mark order as failed
        commandGateway.send(new FailOrderCommand(event.getOrderId()));
    }

    @EventHandler
    public void on(InventoryReservationFailedEvent event) {
        // Compensate: refund payment
        commandGateway.send(new RefundPaymentCommand(event.getOrderId()));
    }
}

Pros Cons
Complete audit history Complex to implement
Replay/debug capability Eventual consistency
Temporal queries Storage grows
Natural fit for sagas Learning curve

When to use: - Audit requirements - Complex domain logic - Need to replay events - CQRS architecture


5. Outbox Pattern (Reliable Messaging)

Philosophy: "Guarantee message delivery alongside database transaction"

Outbox Pattern

Implementation:

// Within same transaction
@Transactional
public void processOrder(Order order) {
    // 1. Business logic
    orderRepository.save(order);

    // 2. Write to outbox (same transaction)
    OutboxEvent event = new OutboxEvent(
        UUID.randomUUID().toString(),
        "OrderCreated",
        JsonUtils.toJson(new OrderCreatedEvent(order)),
        false
    );
    outboxRepository.save(event);
}

// Separate poller (CDC or polling)
@Scheduled(fixedDelay = 1000)
public void publishOutboxEvents() {
    List<OutboxEvent> events = outboxRepository.findUnpublished();
    for (OutboxEvent event : events) {
        try {
            messagePublisher.publish(event.getTopic(), event.getPayload());
            event.setPublished(true);
            outboxRepository.save(event);
        } catch (Exception e) {
            // Will retry next poll
            log.error("Failed to publish event", e);
        }
    }
}

Using Debezium (CDC): Debezium CDC

Pros Cons
Atomic with business logic Additional table/polling
No distributed transaction Slight delay
At-least-once delivery Need deduplication
Simple to implement

When to use: - Reliable event publishing - When message broker might be down - Avoiding 2PC between DB and broker


6. Transactional Inbox Pattern

Philosophy: "Deduplicate incoming messages reliably"

@Transactional
public void handleEvent(Event event) {
    // Check if already processed
    if (inboxRepository.existsById(event.getId())) {
        log.info("Duplicate event, skipping: {}", event.getId());
        return;
    }

    // Process the event
    processBusinessLogic(event);

    // Mark as processed (same transaction)
    inboxRepository.save(new InboxEntry(event.getId(), Instant.now()));
}

Comparison Matrix

Approach Consistency Complexity Scalability Use Case
2PC Strong Medium Poor Same DC, few services
Saga (Choreography) Eventual High Excellent Simple flows, loose coupling
Saga (Orchestration) Eventual Medium Good Complex flows
Workflow Engine Eventual Low Good Long-running, visibility
Event Sourcing Eventual High Excellent Audit, complex domain
Outbox Pattern At-least-once Low Good Reliable messaging

Decision Tree

Decision Tree


Real-World Examples

E-Commerce Order (Saga with Orchestrator)

E-Commerce Order Saga

Travel Booking (Saga with Choreography)

Travel Booking Choreography

Document Approval (Workflow Engine)

Document Approval Workflow


Key Takeaways

  1. Avoid 2PC in microservices - Use sagas instead
  2. Prefer orchestration for complex flows - Easier to understand and debug
  3. Use workflow engines for long-running processes - They handle the hard parts
  4. Always have compensation logic - Things will fail
  5. Outbox pattern for reliability - Atomic with business transaction
  6. Idempotency everywhere - Messages will be duplicated
  7. Monitor saga states - Know where processes are stuck