Skip to main content
Welcome. This site supports keyboard navigation and screen readers. Press ? at any time for keyboard shortcuts. Press [ to focus the sidebar, ] to focus the content. High-contrast themes are available via the toolbar.
serard@dev00:~/cv

Part IX: The Saga Pattern

"Distributed transactions are a lie. Sagas are the truth."

Every non-trivial application eventually reaches a moment where a single operation spans multiple resources. A customer places an order: the system must reserve inventory in the warehouse service, charge the customer's credit card through a payment gateway, and schedule a shipment through a logistics API. Three steps. Three external systems. Three things that can fail independently. The question is not whether one of them will fail -- it is what happens to the other two when it does.

The traditional answer is a distributed transaction. Two-phase commit. A coordinator asks every participant "can you commit?" and, if they all say yes, tells them "commit." If any participant says no, the coordinator tells everyone to roll back. This works in theory and in textbooks. It works in practice inside a single database cluster. It does not work when the participants are a PostgreSQL database, a Stripe payment API, and a FedEx shipping endpoint. Stripe does not implement the XA protocol. FedEx does not support two-phase commit. And even if they did, the coordinator is a single point of failure that holds locks across all participants for the duration of the prepare phase -- a duration that includes network round trips to external services with unpredictable latency.

Two-phase commit does not work at scale. It does not work across organizational boundaries. It does not work when participants are HTTP APIs rather than database connections. And it does not work when you need the system to remain available while a long-running operation is in progress.

The saga pattern is the alternative. Instead of one atomic transaction that spans all participants, a saga breaks the operation into a sequence of local transactions -- steps -- each of which can be individually committed and individually compensated. If step three fails, the saga does not roll back a distributed transaction. Instead, it runs compensating actions for steps two and one, in reverse order, undoing whatever each step did. The compensation is semantic, not transactional: if step two charged a credit card, the compensation issues a refund. If step one reserved inventory, the compensation releases the reservation. Each compensation is its own local transaction, independently committed.

This is not a new idea. Hector Garcia-Molina and Kenneth Salem published "Sagas" in 1987. The pattern has been implemented in every serious distributed systems framework since: AWS Step Functions, Azure Durable Functions, Temporal, NServiceBus, MassTransit, Axon. What FrenchExDev's implementation adds is not a new concept but a sharp, minimal API: five types, one orchestrator, one test double, and Result<T> integration throughout. No message bus required. No persistence framework required. No ceremony. Define your steps. Define your compensations. Let the orchestrator handle the state machine.

This chapter covers the complete FrenchExDev.Net.Saga package: the SagaContext abstract base class and the SagaState enum, the ISagaStep<TContext> interface with its execute-and-compensate contract, the SagaOrchestrator<TContext> sealed class that runs the state machine, the SagaInstance persistence entity and ISagaStore interface, the InMemorySagaStore test double, a real-world order fulfillment example with three steps and three compensations, composition with every other FrenchExDev pattern, comprehensive testing strategies, DI registration, and an honest comparison with other approaches to distributed coordination.


The Problem: Multi-Step Operations That Must Roll Back

Consider the order fulfillment scenario in detail. The customer clicks "Place Order" and the backend must:

  1. Reserve inventory -- decrement stock counts for each line item in the warehouse database.
  2. Charge payment -- create a charge through the payment gateway (Stripe, Adyen, whatever) for the order total.
  3. Schedule shipping -- create a shipment with the logistics provider (FedEx, UPS, a local courier API).

Each step talks to a different system. Each system has its own database. Each system confirms success independently. There is no shared transaction manager. Now consider the failure modes:

Step 1 succeeds, step 2 fails. The payment gateway declines the card. The inventory is reserved but will never be shipped. Without compensation, the warehouse database thinks those items are committed to an order that will never be fulfilled. Other customers cannot buy those items. Revenue is lost.

Steps 1 and 2 succeed, step 3 fails. The shipping API is down. The customer has been charged. The inventory is reserved. But nothing will ship. Without compensation, you have a customer who paid for something they will not receive, and inventory that is locked.

Steps 1 and 2 succeed, step 3 fails, and the refund fails. The shipping API is down, and when you try to refund the payment, the payment gateway is also down. Now you have a partially compensated saga. The inventory was released, but the refund did not go through. You need to retry the refund later, or escalate to human intervention.

Each of these failure modes requires a different response. The saga pattern formalizes these responses as compensating actions -- one per step, run in reverse order -- and the orchestrator manages the state machine that tracks which steps have executed, which need compensation, and whether the compensation itself succeeded or failed.

Why Not Just Use try/catch?

The obvious first attempt is a method with nested try-catch blocks:

public async Task<Result> PlaceOrderAsync(OrderRequest request, CancellationToken ct)
{
    Guid? reservationId = null;
    Guid? paymentId = null;

    try
    {
        // Step 1: Reserve inventory
        reservationId = await _inventory.ReserveAsync(
            request.Items, ct);

        // Step 2: Charge payment
        paymentId = await _payments.ChargeAsync(
            request.CustomerId, request.Total, ct);

        // Step 3: Schedule shipping
        await _shipping.ScheduleAsync(
            request.ShippingAddress, request.Items, ct);

        return Result.Result.Success();
    }
    catch (Exception ex)
    {
        // Compensate step 2
        if (paymentId.HasValue)
        {
            try
            {
                await _payments.RefundAsync(paymentId.Value, ct);
            }
            catch (Exception refundEx)
            {
                _logger.LogError(refundEx, "Refund failed for {PaymentId}",
                    paymentId.Value);
                // Now what? The refund failed. Do we retry? Log and move on?
                // This is where the try-catch approach falls apart.
            }
        }

        // Compensate step 1
        if (reservationId.HasValue)
        {
            try
            {
                await _inventory.ReleaseAsync(reservationId.Value, ct);
            }
            catch (Exception releaseEx)
            {
                _logger.LogError(releaseEx, "Release failed for {ReservationId}",
                    reservationId.Value);
            }
        }

        return Result.Result.Failure(ex.Message);
    }
}

This works for three steps. It becomes unmaintainable at five. It becomes a liability at ten. The problems:

  1. State tracking is manual. The reservationId and paymentId nullable variables are manual state. If you forget to set one, the compensation is skipped. If you check the wrong one, you compensate a step that never ran.

  2. Compensation order is implicit. The compensations run in whatever order you wrote them. If you accidentally compensate step 1 before step 2, you might release inventory before refunding the payment, creating a window where the customer is charged for items that are available for other customers to buy.

  3. Compensation failure is unstructured. Each catch block logs and continues. There is no aggregated report of what failed. There is no retry mechanism. There is no way to resume a partially compensated operation after the application restarts.

  4. No persistence. If the application crashes after step 2 succeeds and before step 3 executes, the inventory is reserved, the payment is charged, and no one knows. There is no record of the in-progress saga. There is no way to resume or compensate after restart.

  5. Adding a step requires modifying the method. Every new step adds another try-catch layer, another nullable variable, and another compensation block. The method grows linearly with the number of steps, and each addition risks breaking the existing compensation logic.

The saga pattern solves all five problems by separating each step into its own class with an explicit execute and compensate pair, tracking state in a dedicated context object, running compensations in guaranteed reverse order, persisting saga state for crash recovery, and allowing new steps to be added without modifying existing code.


The Saga State Machine

A saga is a state machine with six states and five transitions. The SagaState enum captures these states:

public enum SagaState
{
    Pending = 0,
    Running = 1,
    Completed = 2,
    Compensating = 3,
    Compensated = 4,
    Failed = 5
}

The state transitions are deterministic and unidirectional. There is no way to go from Completed back to Running, or from Compensated back to Compensating. The state machine has exactly one happy path and two failure paths:

Diagram
Six states, five transitions — the Saga lifecycle has one happy path and two distinct failure paths, with Compensated and Failed mapping to different operational responses.

State Descriptions

Pending is the initial state. The saga has been created but execution has not started. The SagaContext is initialized with SagaId, all domain-specific properties are set, but no step has run. This is the state the saga is in between construction and the call to SagaOrchestrator.ExecuteAsync.

Running means the orchestrator is actively executing steps in forward order. The CurrentStepIndex property on the context tracks which step is currently executing. Each step's ExecuteAsync is called sequentially, and the index advances after each success. The saga stays in the Running state as long as steps succeed.

Completed is the terminal happy state. Every step in the saga executed successfully. The CompletedAt timestamp is set. No compensation is needed. The operation is done.

Compensating means a step failed and the orchestrator is running compensating actions in reverse order. If step 3 out of 5 fails, the orchestrator compensates steps 2, then 1, then 0 (step 3 is not compensated because it failed -- it never completed, so there is nothing to undo). The LastError property contains the error message from the failed step.

Compensated is a terminal state meaning that all compensations ran successfully. The original operation failed, but the system is back to a consistent state. The inventory was released. The payment was refunded. The saga is over, but the operation did not succeed. This is distinct from Completed -- the caller should treat Compensated as a failure that was cleanly rolled back.

Failed is the terminal state that requires human intervention. A compensation step itself failed. The system is in an inconsistent state. The saga stores the LastError from the failed compensation so that operators can investigate and manually resolve the inconsistency. This state is the "alert the on-call engineer" state.

Why Six States and Not Two?

A simpler model might have just Success and Failure. But that model loses critical information. Was the failure clean (compensated) or dirty (compensation failed)? Is the system consistent or inconsistent? Can the operation be safely retried, or does a human need to intervene first?

The six-state model answers these questions. Completed means the operation succeeded. Compensated means the operation failed but the system is consistent -- you can safely retry if the failure was transient. Failed means the system is inconsistent and retrying without manual intervention could make things worse. These three terminal states map to three different operational responses: do nothing, retry, and escalate.


SagaContext: The State Carrier

The SagaContext abstract base class carries the saga's operational state -- identity, lifecycle state, step progress, error information, and timestamps:

public abstract class SagaContext
{
    public Guid SagaId { get; set; } = Guid.NewGuid();
    public SagaState State { get; set; } = SagaState.Pending;
    public int CurrentStepIndex { get; set; }
    public string? LastError { get; set; }
    public DateTimeOffset StartedAt { get; set; }
    public DateTimeOffset? CompletedAt { get; set; }
}

The properties are straightforward:

  • SagaId uniquely identifies this saga instance. Defaulted to a new GUID on construction. Used as the key for persistence in ISagaStore.
  • State tracks the current lifecycle state. Starts as Pending. Transitions are managed by the orchestrator, not by consumer code.
  • CurrentStepIndex indicates which step is currently executing (during Running) or which step failed (during Compensating). Zero-indexed.
  • LastError contains the error message from the most recent failure -- either a step failure or a compensation failure. null when no error has occurred.
  • StartedAt records when ExecuteAsync was called. Set by the orchestrator at the beginning of execution.
  • CompletedAt records when the saga reached a terminal state (Completed, Compensated, or Failed). null while the saga is still in progress.

Why Abstract?

SagaContext is abstract because it carries only orchestration state. Domain state -- the order ID, the payment ID, the shipment tracking number, the dollar amount -- belongs to the consumer's derived class. The base class does not know or care what domain data the saga carries. It only knows the lifecycle.

Here is a typical derived context for an order fulfillment saga:

public sealed class OrderSagaContext : SagaContext
{
    public Guid OrderId { get; set; }
    public Guid? PaymentId { get; set; }
    public Guid? ShipmentId { get; set; }
    public decimal Amount { get; set; }
    public string CustomerEmail { get; set; } = string.Empty;
    public IReadOnlyList<OrderLineItem> Items { get; set; } = [];
}

Notice that PaymentId and ShipmentId are nullable. This is intentional. When the saga starts, no payment has been charged and no shipment has been scheduled. These properties are populated by the steps as they execute. Step 2 (ChargePaymentStep) sets PaymentId after a successful charge. Step 3 (ScheduleShippingStep) sets ShipmentId after a successful scheduling. If step 3 fails and the saga compensates, the ChargePaymentStep.CompensateAsync method reads PaymentId from the context to know which payment to refund.

The context is the shared memory of the saga. Steps write to it during execution. Other steps read from it during their execution or compensation. The orchestrator does not interpret the domain properties at all -- it just passes the context through.

Context Design Guidelines

A few guidelines for designing saga contexts:

Make step outputs nullable. If a step creates a resource and stores its ID in the context, that ID should be nullable. The compensation code can check nullability to determine whether the step actually completed before attempting to undo it.

Avoid putting services in the context. The context carries data, not behavior. Services like IPaymentGateway or IInventoryService are injected into step constructors via DI, not passed through the context.

Keep it serializable. If you use ISagaStore for persistence, the context will be serialized to JSON (via SagaInstance.ContextJson). Avoid putting non-serializable types -- streams, database connections, HTTP clients -- in the context.

Use value types and immutable collections for domain data. Guid, decimal, string, IReadOnlyList<T> are safe. Mutable collections and entity objects with navigation properties are not.

Here is a more complex example -- a subscription renewal saga with five steps:

public sealed class RenewalSagaContext : SagaContext
{
    public Guid SubscriptionId { get; set; }
    public Guid CustomerId { get; set; }
    public string PlanCode { get; set; } = string.Empty;
    public decimal RenewalAmount { get; set; }
    public DateTimeOffset CurrentPeriodEnd { get; set; }
    public DateTimeOffset? NewPeriodEnd { get; set; }
    public Guid? InvoiceId { get; set; }
    public Guid? PaymentId { get; set; }
    public bool EntitlementsExtended { get; set; }
    public bool NotificationSent { get; set; }
}

Five steps: create invoice, charge payment, extend entitlements, update subscription period, send notification. Five compensating actions: void invoice, refund payment, revoke entitlements, revert subscription period, send cancellation notification. Each step reads what it needs from the context and writes its output for subsequent steps and compensations.


ISagaStep: Execute and Compensate

The ISagaStep<TContext> interface is the core abstraction. Every step in a saga implements it:

public interface ISagaStep<in TContext> where TContext : SagaContext
{
    Task<Result.Result> ExecuteAsync(TContext context, CancellationToken ct = default);
    Task<Result.Result> CompensateAsync(TContext context, CancellationToken ct = default);
}

Two methods. No more, no less.

ExecuteAsync is the forward action. It performs the step's work -- reserve inventory, charge a payment, schedule a shipment -- and returns Result.Success() on success or Result.Failure(errorMessage) on failure. When it succeeds, the orchestrator moves to the next step. When it fails, the orchestrator transitions to the Compensating state and begins reverse-order compensation.

CompensateAsync is the reverse action. It undoes whatever ExecuteAsync did -- release the reservation, refund the payment, cancel the shipment. It also returns Result, because compensation can fail too. When compensation fails, the orchestrator transitions to the Failed state.

The Contravariant Type Parameter

The in keyword on TContext makes the interface contravariant. This means an ISagaStep<SagaContext> can be used as an ISagaStep<OrderSagaContext>, because a step that can operate on any SagaContext can certainly operate on the more specific OrderSagaContext. This is useful for generic steps like logging or metrics that do not need domain-specific context:

public sealed class LoggingStep<TContext> : ISagaStep<TContext>
    where TContext : SagaContext
{
    private readonly ILogger _logger;

    public LoggingStep(ILogger<LoggingStep<TContext>> logger) => _logger = logger;

    public Task<Result.Result> ExecuteAsync(TContext context, CancellationToken ct)
    {
        _logger.LogInformation("Saga {SagaId} step executing", context.SagaId);
        return Task.FromResult(Result.Result.Success());
    }

    public Task<Result.Result> CompensateAsync(TContext context, CancellationToken ct)
    {
        _logger.LogWarning("Saga {SagaId} step compensating", context.SagaId);
        return Task.FromResult(Result.Result.Success());
    }
}

This step works with any SagaContext-derived type because it only reads properties from the base class. Contravariance makes this composition type-safe without casting.

Idempotency: The Non-Negotiable Requirement

Compensation must be idempotent. This is not a nice-to-have. It is a hard requirement.

Consider: step 2 charges a payment. Step 3 fails. The orchestrator calls ChargePaymentStep.CompensateAsync, which issues a refund. The refund call times out. Did it succeed? The payment gateway processed the request but the response was lost in transit. The saga records a compensation failure and transitions to Failed. An operator investigates, sees the Failed state, and triggers a retry of the compensation. The CompensateAsync is called again. If the compensation is not idempotent, the customer gets refunded twice.

Idempotent compensation means that calling CompensateAsync multiple times with the same context produces the same result as calling it once. Techniques for achieving idempotency:

  1. Check before acting. Before issuing a refund, check the payment status. If it is already refunded, return Success without calling the gateway.
public async Task<Result.Result> CompensateAsync(
    OrderSagaContext context, CancellationToken ct)
{
    if (!context.PaymentId.HasValue)
        return Result.Result.Success(); // Step never executed; nothing to undo.

    var status = await _gateway.GetStatusAsync(context.PaymentId.Value, ct);
    if (status == PaymentStatus.Refunded)
        return Result.Result.Success(); // Already compensated; idempotent.

    await _gateway.RefundAsync(context.PaymentId.Value, ct);
    return Result.Result.Success();
}
  1. Use idempotency keys. Pass the SagaId as an idempotency key to external APIs. Stripe, for example, supports idempotency keys on refund requests. Two refund requests with the same key produce only one refund.

  2. Guard on nullable context properties. If PaymentId is null, the charge step never ran, and there is nothing to compensate. Return Success immediately.

Result Integration

Both methods return Task<Result.Result>, not Task or Task<bool>. This is consistent with every other FrenchExDev pattern. The Result type carries either success or a failure message. The orchestrator inspects the result to decide whether to continue forward (on success), begin compensation (on step failure), or transition to Failed (on compensation failure).

This avoids the exception-as-control-flow anti-pattern. A payment decline is not exceptional -- it is a normal business outcome. Returning Result.Failure("Card declined") is clearer than throwing a PaymentDeclinedException that the orchestrator catches.

Steps can still throw for genuinely unexpected situations -- null reference exceptions, network errors, out-of-memory conditions. The orchestrator treats unhandled exceptions the same as a Result.Failure: it captures the exception message and begins compensation. But the preferred path is explicit Result returns for expected failure modes.


The Class Diagram

Before diving into the orchestrator, here is the complete type diagram showing how the five public types relate to each other:

Diagram
The five public types behind the Saga package — orchestrator, step interface, abstract context, persisted instance and store — the entire contract a consumer has to learn.

Five types. One orchestrator (SagaOrchestrator<TContext>). One step interface (ISagaStep<TContext>). One context base class (SagaContext). One persistence entity (SagaInstance). One store interface (ISagaStore). The orchestrator depends on the step interface and the context. The store depends on the persistence entity. The orchestrator does not depend on the store -- persistence is an optional layer that consumers add when they need crash recovery.


SagaOrchestrator: The Orchestration Engine

The SagaOrchestrator<TContext> is a sealed class that takes an ordered list of steps and runs them:

public sealed class SagaOrchestrator<TContext> where TContext : SagaContext
{
    public SagaOrchestrator(IReadOnlyList<ISagaStep<TContext>> steps);
    public async Task<Result.Result> ExecuteAsync(
        TContext context, CancellationToken ct = default);
}

One constructor. One public method. The simplicity is deliberate. The orchestrator does three things: run steps forward, compensate steps backward on failure, and track state transitions. It does not persist. It does not retry. It does not schedule. It does not log. Each of those concerns belongs to a different layer -- persistence in ISagaStore, retries in a behavior wrapper, scheduling in a background service, logging in the steps themselves.

The Happy Path

When all steps succeed, the orchestrator follows this sequence:

  1. Set context.State = SagaState.Running.
  2. Set context.StartedAt = DateTimeOffset.UtcNow.
  3. If there are zero steps, set State = Completed, set CompletedAt, and return Result.Success().
  4. For each step i from 0 to N-1: a. Set context.CurrentStepIndex = i. b. Call steps[i].ExecuteAsync(context, ct). c. If the result is a failure, branch to compensation (described below).
  5. Set context.State = SagaState.Completed.
  6. Set context.CompletedAt = DateTimeOffset.UtcNow.
  7. Return Result.Success().

The zero-step edge case is handled explicitly. An orchestrator with no steps is valid -- it represents a no-op saga that immediately completes. This is useful in testing and in dynamically composed sagas where the step list might be conditionally empty.

The Compensation Path

When step i fails, the orchestrator compensates steps i-1 down to 0:

  1. Set context.LastError to the failure message from step i.
  2. Set context.State = SagaState.Compensating.
  3. For each step j from i-1 down to 0: a. Call steps[j].CompensateAsync(context, ct). b. If the compensation fails, branch to the failure path (described below).
  4. Set context.State = SagaState.Compensated.
  5. Set context.CompletedAt = DateTimeOffset.UtcNow.
  6. Return Result.Failure(context.LastError) -- the original step error.

Note that step i itself is not compensated. It failed, which means it did not complete its work. There is nothing to undo for a step that did not succeed. Only steps 0 through i-1 -- the steps that successfully executed -- need compensation.

The compensation runs in reverse order: i-1, then i-2, then i-3, down to 0. This is not arbitrary. Steps often depend on the output of previous steps. Step 2 (charge payment) might depend on the reservation created by step 1 (reserve inventory). Compensating step 2 before step 1 ensures that the payment is refunded before the inventory reservation is released. If you compensated in forward order, you might release inventory for which a charge is still pending.

The Failure Path

When compensation itself fails at step j, the orchestrator gives up:

  1. Set context.LastError to the compensation failure message.
  2. Set context.State = SagaState.Failed.
  3. Set context.CompletedAt = DateTimeOffset.UtcNow.
  4. Return Result.Failure(context.LastError).

The Failed state means the saga could not clean up after itself. Steps 0 through j-1 may or may not have been compensated (depending on where j is in the reverse iteration). Step j's compensation failed. Steps j+1 through i-1 were compensated successfully. The system is in an inconsistent state that requires human intervention.

This is the correct behavior. The orchestrator does not retry. It does not silently swallow the compensation error. It marks the saga as failed and returns the error. The caller -- or an operator monitoring the ISagaStore -- can decide whether to retry the compensation manually, escalate to engineering, or accept the inconsistency and reconcile later.

The Sequence Diagram

Here is the complete orchestration sequence for a three-step saga where step 3 fails:

Diagram
Step 3 fails and the orchestrator walks back — payment refunded before inventory released — showing why reverse-order compensation is the invariant that keeps the system consistent.

Step 1 executes and succeeds (inventory reserved). Step 2 executes and succeeds (payment charged). Step 3 executes and fails (shipping API down). The orchestrator sets the state to Compensating and reverses: step 2 compensates (payment refunded), step 1 compensates (inventory released). Final state: Compensated. The caller receives Result.Failure with the shipping error message. The system is consistent.

Why Sequential, Not Parallel?

The steps execute sequentially, not in parallel. This is a deliberate design decision. There are three reasons:

Reason 1: Step dependencies. Steps often depend on the output of previous steps. ChargePaymentStep needs the reservation amount calculated by ReserveInventoryStep. ScheduleShippingStep needs the confirmed items from ReserveInventoryStep and the receipt from ChargePaymentStep. Parallel execution would require explicit dependency declarations and a topological sort -- complexity that does not belong in a minimal saga implementation.

Reason 2: Compensation ordering. If steps execute in parallel, the compensation order is ambiguous. Which parallel branch do you compensate first? The reverse-order guarantee that makes compensation safe depends on sequential execution.

Reason 3: Debugging. When a saga fails, the CurrentStepIndex tells you exactly which step failed. With parallel execution, multiple steps could fail simultaneously, and the error reporting becomes ambiguous.

If you need parallel execution within a saga, model the parallel work as a single step that internally uses Task.WhenAll:

public sealed class ParallelNotificationStep : ISagaStep<OrderSagaContext>
{
    private readonly IEmailService _email;
    private readonly ISmsService _sms;
    private readonly IPushService _push;

    public ParallelNotificationStep(
        IEmailService email, ISmsService sms, IPushService push)
    {
        _email = email;
        _sms = sms;
        _push = push;
    }

    public async Task<Result.Result> ExecuteAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        await Task.WhenAll(
            _email.SendOrderConfirmationAsync(context.OrderId, ct),
            _sms.SendOrderSmsAsync(context.OrderId, ct),
            _push.SendOrderPushAsync(context.OrderId, ct));

        return Result.Result.Success();
    }

    public Task<Result.Result> CompensateAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        // Notifications do not need compensation -- they are fire-and-forget.
        return Task.FromResult(Result.Result.Success());
    }
}

The saga sees one step. Inside that step, three operations run in parallel. The compensation is a no-op because sent notifications cannot be unsent. This pattern preserves sequential orchestration while allowing parallelism where it makes sense.


Persistence: ISagaStore and SagaInstance

The SagaOrchestrator runs sagas in memory. When the process exits -- gracefully or by crashing -- all in-progress saga state is lost. For sagas that take milliseconds (validate input, call one API, return), this is fine. For sagas that take minutes or hours (multi-step approval workflows, payment processing with webhook callbacks), losing state on crash is not acceptable.

ISagaStore and SagaInstance provide the persistence layer.

SagaInstance: The Persistence Entity

SagaInstance is a flat, serializable entity suitable for database storage:

public sealed class SagaInstance
{
    public Guid Id { get; set; }
    public string SagaType { get; set; } = string.Empty;
    public string ContextJson { get; set; } = string.Empty;
    public SagaState State { get; set; }
    public int CurrentStepIndex { get; set; }
    public DateTimeOffset CreatedAt { get; set; }
    public DateTimeOffset? CompletedAt { get; set; }
    public string? LastError { get; set; }
}

The properties map directly to the saga's operational state:

  • Id corresponds to SagaContext.SagaId.
  • SagaType is a discriminator string that identifies which saga type this instance belongs to (e.g., "OrderFulfillment", "SubscriptionRenewal"). This is needed because ISagaStore stores all saga types in the same table/collection.
  • ContextJson is the serialized SagaContext-derived object. The entire domain state -- order ID, payment ID, amount, items -- is serialized to JSON and stored in this field. On recovery, it is deserialized back to the concrete context type.
  • State, CurrentStepIndex, CompletedAt, and LastError mirror the corresponding SagaContext properties.
  • CreatedAt records when the saga instance was first persisted.

ISagaStore: The Persistence Interface

public interface ISagaStore
{
    Task SaveAsync(SagaInstance instance, CancellationToken ct = default);
    Task<SagaInstance?> FindAsync(Guid id, CancellationToken ct = default);
    Task<IReadOnlyList<SagaInstance>> FindPendingAsync(CancellationToken ct = default);
    Task UpdateAsync(SagaInstance instance, CancellationToken ct = default);
}

Four methods:

SaveAsync persists a new saga instance. Called once when the saga starts, before the first step executes.

FindAsync retrieves a saga instance by its ID. Used for monitoring, debugging, and manual compensation.

FindPendingAsync returns all saga instances that are not in a terminal state (Completed, Compensated, or Failed). Used by a recovery background service that resumes in-progress sagas after an application restart.

UpdateAsync updates an existing saga instance. Called after each step executes (to persist progress), after compensation (to record the new state), and when the saga reaches a terminal state (to record completion).

The Persistence Lifecycle

Here is how persistence integrates with the orchestrator in a production scenario:

public sealed class PersistentOrderSagaRunner
{
    private readonly SagaOrchestrator<OrderSagaContext> _orchestrator;
    private readonly ISagaStore _store;

    public PersistentOrderSagaRunner(
        SagaOrchestrator<OrderSagaContext> orchestrator,
        ISagaStore store)
    {
        _orchestrator = orchestrator;
        _store = store;
    }

    public async Task<Result.Result> RunAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        // 1. Persist the initial state
        var instance = new SagaInstance
        {
            Id = context.SagaId,
            SagaType = "OrderFulfillment",
            ContextJson = JsonSerializer.Serialize(context),
            State = SagaState.Pending,
            CurrentStepIndex = 0,
            CreatedAt = DateTimeOffset.UtcNow
        };
        await _store.SaveAsync(instance, ct);

        // 2. Execute the saga
        var result = await _orchestrator.ExecuteAsync(context, ct);

        // 3. Persist the final state
        instance.ContextJson = JsonSerializer.Serialize(context);
        instance.State = context.State;
        instance.CurrentStepIndex = context.CurrentStepIndex;
        instance.CompletedAt = context.CompletedAt;
        instance.LastError = context.LastError;
        await _store.UpdateAsync(instance, ct);

        return result;
    }
}

This wrapper persists the saga before execution and after completion. For maximum durability, you could also persist after each step by wrapping the orchestrator with a step-level callback -- but for most use cases, persisting at the start and end is sufficient. The FindPendingAsync method enables a background service to discover sagas that were Running or Compensating when the process crashed, and resume or re-compensate them.

Why Persistence Is Separate from the Orchestrator

The SagaOrchestrator does not know about ISagaStore. This is intentional. Persistence is an orthogonal concern. Some sagas are short-lived and do not need persistence -- a three-step saga that takes 200 milliseconds and runs inside an HTTP request does not benefit from database storage. Other sagas are long-lived and absolutely need persistence -- a multi-day approval workflow that spans multiple user interactions must survive application restarts.

By keeping persistence separate, the orchestrator stays simple: it runs steps and manages state transitions. Consumers who need persistence wrap the orchestrator with a thin layer that calls ISagaStore before and after execution. Consumers who do not need persistence use the orchestrator directly.


Real-World Example: Order Fulfillment Saga

Let us build the complete order fulfillment saga with three steps. This is the same scenario described in the introduction, now implemented with the FrenchExDev.Net.Saga types.

The Context

public sealed class OrderSagaContext : SagaContext
{
    public Guid OrderId { get; set; }
    public Guid CustomerId { get; set; }
    public decimal Amount { get; set; }
    public string ShippingAddress { get; set; } = string.Empty;
    public IReadOnlyList<OrderLineItem> Items { get; set; } = [];

    // Step outputs (set during execution, read during compensation)
    public Guid? ReservationId { get; set; }
    public Guid? PaymentId { get; set; }
    public Guid? ShipmentId { get; set; }
}

public sealed record OrderLineItem(string Sku, int Quantity);

The context carries both input data (OrderId, CustomerId, Amount, ShippingAddress, Items) and step outputs (ReservationId, PaymentId, ShipmentId). The outputs are nullable because they are populated by steps during execution.

Step 1: Reserve Inventory

[Injectable(Lifetime.Transient)]
public sealed class ReserveInventoryStep : ISagaStep<OrderSagaContext>
{
    private readonly IInventoryService _inventory;

    public ReserveInventoryStep(IInventoryService inventory)
    {
        _inventory = inventory;
    }

    public async Task<Result.Result> ExecuteAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        var reservationResult = await _inventory.ReserveAsync(
            context.Items.Select(i => (i.Sku, i.Quantity)).ToList(),
            ct);

        if (reservationResult.IsFailure)
            return Result.Result.Failure(
                $"Inventory reservation failed: {reservationResult.Error}");

        context.ReservationId = reservationResult.Value;
        return Result.Result.Success();
    }

    public async Task<Result.Result> CompensateAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        if (!context.ReservationId.HasValue)
            return Result.Result.Success(); // Nothing to compensate.

        var releaseResult = await _inventory.ReleaseAsync(
            context.ReservationId.Value, ct);

        return releaseResult.IsSuccess
            ? Result.Result.Success()
            : Result.Result.Failure(
                $"Inventory release failed: {releaseResult.Error}");
    }
}

ExecuteAsync calls the inventory service to reserve items. On success, it stores the reservation ID in the context. On failure, it returns a Result.Failure with a descriptive message -- the orchestrator will not call CompensateAsync on this step because it failed, but it will compensate all previously succeeded steps.

CompensateAsync checks if ReservationId is set. If the step never executed (because an earlier step failed), the reservation ID is null and there is nothing to undo. If it is set, it calls the inventory service to release the reservation.

Step 2: Charge Payment

[Injectable(Lifetime.Transient)]
public sealed class ChargePaymentStep : ISagaStep<OrderSagaContext>
{
    private readonly IPaymentGateway _gateway;

    public ChargePaymentStep(IPaymentGateway gateway)
    {
        _gateway = gateway;
    }

    public async Task<Result.Result> ExecuteAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        var chargeResult = await _gateway.ChargeAsync(
            context.CustomerId,
            context.Amount,
            idempotencyKey: context.SagaId.ToString(),
            ct);

        if (chargeResult.IsFailure)
            return Result.Result.Failure(
                $"Payment charge failed: {chargeResult.Error}");

        context.PaymentId = chargeResult.Value;
        return Result.Result.Success();
    }

    public async Task<Result.Result> CompensateAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        if (!context.PaymentId.HasValue)
            return Result.Result.Success();

        // Check if already refunded (idempotency)
        var status = await _gateway.GetStatusAsync(
            context.PaymentId.Value, ct);

        if (status == PaymentStatus.Refunded)
            return Result.Result.Success();

        var refundResult = await _gateway.RefundAsync(
            context.PaymentId.Value,
            idempotencyKey: $"{context.SagaId}-refund",
            ct);

        return refundResult.IsSuccess
            ? Result.Result.Success()
            : Result.Result.Failure(
                $"Payment refund failed: {refundResult.Error}");
    }
}

Note the idempotency key on both the charge and the refund. The SagaId ensures that if the charge is retried (due to a timeout with an ambiguous response), the payment gateway processes it only once. The refund uses a derived key (SagaId-refund) to ensure refund idempotency separately.

The compensation also checks the payment status before attempting a refund. If the payment is already refunded (because a previous compensation attempt succeeded but the response was lost), the compensation returns Success without calling the gateway again. This is the "check before acting" idempotency pattern described earlier.

Step 3: Schedule Shipping

[Injectable(Lifetime.Transient)]
public sealed class ScheduleShippingStep : ISagaStep<OrderSagaContext>
{
    private readonly IShippingService _shipping;

    public ScheduleShippingStep(IShippingService shipping)
    {
        _shipping = shipping;
    }

    public async Task<Result.Result> ExecuteAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        var shipResult = await _shipping.ScheduleAsync(
            context.ShippingAddress,
            context.Items.Select(i => (i.Sku, i.Quantity)).ToList(),
            ct);

        if (shipResult.IsFailure)
            return Result.Result.Failure(
                $"Shipping scheduling failed: {shipResult.Error}");

        context.ShipmentId = shipResult.Value;
        return Result.Result.Success();
    }

    public async Task<Result.Result> CompensateAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        if (!context.ShipmentId.HasValue)
            return Result.Result.Success();

        var cancelResult = await _shipping.CancelAsync(
            context.ShipmentId.Value, ct);

        return cancelResult.IsSuccess
            ? Result.Result.Success()
            : Result.Result.Failure(
                $"Shipping cancellation failed: {cancelResult.Error}");
    }
}

The pattern is identical: execute the forward action, store the output, compensate by undoing the action, guard on the nullable output. Every step follows this structure. The variation is in the domain logic, not in the saga plumbing.

Assembling and Running the Orchestrator

// In the composition root or a factory:
var steps = new List<ISagaStep<OrderSagaContext>>
{
    serviceProvider.GetRequiredService<ReserveInventoryStep>(),
    serviceProvider.GetRequiredService<ChargePaymentStep>(),
    serviceProvider.GetRequiredService<ScheduleShippingStep>()
};

var orchestrator = new SagaOrchestrator<OrderSagaContext>(steps);

// In the handler:
var context = new OrderSagaContext
{
    OrderId = Guid.NewGuid(),
    CustomerId = command.CustomerId,
    Amount = command.Total,
    ShippingAddress = command.ShippingAddress,
    Items = command.Items.Select(i => new OrderLineItem(i.Sku, i.Quantity)).ToList()
};

Result.Result result = await orchestrator.ExecuteAsync(context, ct);

if (result.IsSuccess)
{
    // context.State == SagaState.Completed
    // context.PaymentId, context.ShipmentId, context.ReservationId are all set
    _logger.LogInformation("Order {OrderId} fulfilled", context.OrderId);
}
else if (context.State == SagaState.Compensated)
{
    // All compensations succeeded -- system is consistent
    _logger.LogWarning("Order {OrderId} failed and was compensated: {Error}",
        context.OrderId, context.LastError);
}
else if (context.State == SagaState.Failed)
{
    // Compensation failed -- system is inconsistent
    _logger.LogError("Order {OrderId} failed and compensation failed: {Error}",
        context.OrderId, context.LastError);
    // Alert on-call. Persist to dead letter. Manual intervention required.
}

The caller inspects both the Result and the context.State to determine the outcome. Result.IsSuccess means the saga completed. Result.IsFailure with State == Compensated means clean rollback. Result.IsFailure with State == Failed means dirty failure.


Composing with Other Patterns

The saga pattern does not exist in isolation. Every other FrenchExDev pattern can compose with it.

Guard: Parameter Validation in Step Constructors

Step constructors use Guard.Against.Null for dependency validation:

public sealed class ChargePaymentStep : ISagaStep<OrderSagaContext>
{
    private readonly IPaymentGateway _gateway;

    public ChargePaymentStep(IPaymentGateway gateway)
    {
        _gateway = Guard.Against.Null(gateway);
    }

    // ...
}

And Guard.Ensure for context invariants at the beginning of ExecuteAsync:

public async Task<Result.Result> ExecuteAsync(
    OrderSagaContext context, CancellationToken ct)
{
    Guard.Ensure.That(context.Amount > 0, "Amount must be positive");
    Guard.Ensure.NotNull(context.CustomerId, "CustomerId must be set");

    // Proceed with charge...
}

Result: The Common Language

Every step returns Result. The orchestrator consumes Result. The caller receives Result. There is no impedance mismatch between the saga infrastructure and the consuming code. If a step internally uses a service that returns Result<T>, the result flows naturally:

var chargeResult = await _gateway.ChargeAsync(context.CustomerId, context.Amount, ct);

// chargeResult is Result<Guid>
// Convert to Result (non-generic) for the saga contract:
if (chargeResult.IsFailure)
    return Result.Result.Failure(chargeResult.Error);

context.PaymentId = chargeResult.Value;
return Result.Result.Success();

Clock: Deterministic Timestamps

The SagaContext.StartedAt and CompletedAt properties use DateTimeOffset.UtcNow inside the orchestrator. For testing with deterministic timestamps, you can inject IClock and set the timestamps in the context before and after orchestration:

public sealed class TimedSagaRunner<TContext> where TContext : SagaContext
{
    private readonly SagaOrchestrator<TContext> _orchestrator;
    private readonly IClock _clock;

    public TimedSagaRunner(
        SagaOrchestrator<TContext> orchestrator, IClock clock)
    {
        _orchestrator = orchestrator;
        _clock = clock;
    }

    public async Task<Result.Result> RunAsync(
        TContext context, CancellationToken ct)
    {
        context.StartedAt = _clock.UtcNow;
        var result = await _orchestrator.ExecuteAsync(context, ct);
        context.CompletedAt = _clock.UtcNow;
        return result;
    }
}

In tests, FakeClock gives you deterministic timestamps:

var clock = new FakeClock(new DateTimeOffset(2026, 4, 3, 12, 0, 0, TimeSpan.Zero));
// After execution:
Assert.Equal(clock.UtcNow, context.CompletedAt);

Mediator: Triggering Sagas from Command Handlers

A mediator command handler is the natural entry point for starting a saga:

public sealed class PlaceOrderCommandHandler
    : IRequestHandler<PlaceOrderCommand, Result.Result>
{
    private readonly SagaOrchestrator<OrderSagaContext> _orchestrator;

    public PlaceOrderCommandHandler(
        SagaOrchestrator<OrderSagaContext> orchestrator)
    {
        _orchestrator = orchestrator;
    }

    public async Task<Result.Result> HandleAsync(
        PlaceOrderCommand command, CancellationToken ct)
    {
        var context = new OrderSagaContext
        {
            OrderId = Guid.NewGuid(),
            CustomerId = command.CustomerId,
            Amount = command.Total,
            ShippingAddress = command.ShippingAddress,
            Items = command.Items
                .Select(i => new OrderLineItem(i.Sku, i.Quantity))
                .ToList()
        };

        return await _orchestrator.ExecuteAsync(context, ct);
    }
}

The controller sends a command. The mediator routes it to the handler. The handler starts the saga. The saga orchestrates the multi-step operation. The result flows back through the mediator pipeline (where behaviors can add logging, metrics, etc.) to the controller.

Outbox: Publishing Saga Completion Events

When a saga completes, you often need to publish a domain event. The outbox pattern ensures the event is published exactly once, even if the process crashes after the saga completes but before the event is sent:

public async Task<Result.Result> HandleAsync(
    PlaceOrderCommand command, CancellationToken ct)
{
    var context = new OrderSagaContext { /* ... */ };
    var result = await _orchestrator.ExecuteAsync(context, ct);

    if (result.IsSuccess)
    {
        // The order aggregate emits an OrderPlaced domain event.
        // The OutboxInterceptor captures it in the same transaction.
        var order = await _orders.GetByIdAsync(context.OrderId, ct);
        order.MarkFulfilled(context.PaymentId!.Value, context.ShipmentId!.Value);
        await _db.SaveChangesAsync(ct);
        // The OutboxInterceptor persists the OrderFulfilled event
        // in the outbox table within the same transaction.
    }

    return result;
}

Reactive: Streaming Saga State Changes

If you need real-time visibility into saga progress -- a dashboard showing which sagas are running, which are compensating, which have failed -- you can publish saga state transitions to an IEventStream<SagaStateChanged>:

public sealed record SagaStateChanged(
    Guid SagaId,
    string SagaType,
    SagaState PreviousState,
    SagaState NewState,
    DateTimeOffset Timestamp);

Subscribers can filter, throttle, and aggregate these events using the IEventStream<T> operators from Part VIII.


Testing Sagas

Sagas are inherently complex -- multiple steps, multiple failure modes, state transitions, compensation logic. Testing is not optional. The FrenchExDev.Net.Saga package makes testing straightforward by providing InMemorySagaStore and by keeping the orchestrator free of static dependencies.

InMemorySagaStore

public sealed class InMemorySagaStore : ISagaStore

Thread-safe via ConcurrentDictionary<Guid, SagaInstance>. All four methods -- SaveAsync, FindAsync, FindPendingAsync, UpdateAsync -- operate on the in-memory dictionary. No database. No file system. No configuration. Just new InMemorySagaStore().

var store = new InMemorySagaStore();

// Save
var instance = new SagaInstance
{
    Id = Guid.NewGuid(),
    SagaType = "OrderFulfillment",
    ContextJson = "{}",
    State = SagaState.Running,
    CurrentStepIndex = 0,
    CreatedAt = DateTimeOffset.UtcNow
};
await store.SaveAsync(instance);

// Find
var found = await store.FindAsync(instance.Id);
Assert.NotNull(found);
Assert.Equal(SagaState.Running, found!.State);

// Update
instance.State = SagaState.Completed;
instance.CompletedAt = DateTimeOffset.UtcNow;
await store.UpdateAsync(instance);

// FindPending (should be empty now -- Completed is terminal)
var pending = await store.FindPendingAsync();
Assert.Empty(pending);

Test Steps: Configurable Success and Failure

For unit testing the orchestrator, you need steps that can be configured to succeed or fail:

public sealed class TestStep : ISagaStep<OrderSagaContext>
{
    private readonly bool _executeShouldSucceed;
    private readonly bool _compensateShouldSucceed;

    public bool ExecuteCalled { get; private set; }
    public bool CompensateCalled { get; private set; }
    public int ExecuteCallCount { get; private set; }
    public int CompensateCallCount { get; private set; }

    public TestStep(
        bool executeShouldSucceed = true,
        bool compensateShouldSucceed = true)
    {
        _executeShouldSucceed = executeShouldSucceed;
        _compensateShouldSucceed = compensateShouldSucceed;
    }

    public Task<Result.Result> ExecuteAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        ExecuteCalled = true;
        ExecuteCallCount++;
        return Task.FromResult(_executeShouldSucceed
            ? Result.Result.Success()
            : Result.Result.Failure("Step execution failed"));
    }

    public Task<Result.Result> CompensateAsync(
        OrderSagaContext context, CancellationToken ct)
    {
        CompensateCalled = true;
        CompensateCallCount++;
        return Task.FromResult(_compensateShouldSucceed
            ? Result.Result.Success()
            : Result.Result.Failure("Step compensation failed"));
    }
}

This test double records whether each method was called and how many times. The constructor parameters control whether each method succeeds or fails. This is enough to test every path through the orchestrator.

Test 1: Happy Path -- All Steps Succeed

[Fact]
public async Task ExecuteAsync_AllStepsSucceed_CompletesSuccessfully()
{
    // Arrange
    var step1 = new TestStep();
    var step2 = new TestStep();
    var step3 = new TestStep();
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>> { step1, step2, step3 });
    var context = new OrderSagaContext
    {
        OrderId = Guid.NewGuid(),
        Amount = 99.99m
    };

    // Act
    var result = await orchestrator.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsSuccess);
    Assert.Equal(SagaState.Completed, context.State);
    Assert.NotNull(context.CompletedAt);
    Assert.Null(context.LastError);

    // All steps executed
    Assert.True(step1.ExecuteCalled);
    Assert.True(step2.ExecuteCalled);
    Assert.True(step3.ExecuteCalled);

    // No steps compensated
    Assert.False(step1.CompensateCalled);
    Assert.False(step2.CompensateCalled);
    Assert.False(step3.CompensateCalled);
}

Test 2: Compensation Path -- Step 3 Fails

[Fact]
public async Task ExecuteAsync_Step3Fails_CompensatesSteps2And1()
{
    // Arrange
    var step1 = new TestStep();
    var step2 = new TestStep();
    var step3 = new TestStep(executeShouldSucceed: false);
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>> { step1, step2, step3 });
    var context = new OrderSagaContext();

    // Act
    var result = await orchestrator.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsFailure);
    Assert.Equal(SagaState.Compensated, context.State);
    Assert.NotNull(context.CompletedAt);
    Assert.Equal("Step execution failed", context.LastError);

    // Steps 1 and 2 executed; step 3 executed but failed
    Assert.True(step1.ExecuteCalled);
    Assert.True(step2.ExecuteCalled);
    Assert.True(step3.ExecuteCalled);

    // Steps 1 and 2 compensated; step 3 not compensated (it failed)
    Assert.True(step1.CompensateCalled);
    Assert.True(step2.CompensateCalled);
    Assert.False(step3.CompensateCalled);
}

Test 3: Failure Path -- Compensation Fails

[Fact]
public async Task ExecuteAsync_CompensationFails_TransitionsToFailed()
{
    // Arrange
    var step1 = new TestStep();
    var step2 = new TestStep(compensateShouldSucceed: false);
    var step3 = new TestStep(executeShouldSucceed: false);
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>> { step1, step2, step3 });
    var context = new OrderSagaContext();

    // Act
    var result = await orchestrator.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsFailure);
    Assert.Equal(SagaState.Failed, context.State);
    Assert.Equal("Step compensation failed", context.LastError);

    // Step 3 failed during execution
    Assert.True(step3.ExecuteCalled);
    Assert.False(step3.CompensateCalled);

    // Step 2 compensation was attempted and failed
    Assert.True(step2.CompensateCalled);

    // Step 1 compensation was NOT reached
    // (because step 2 compensation failed first)
    Assert.False(step1.CompensateCalled);
}

This test verifies a critical detail: when compensation fails at step 2, step 1 is not compensated. The orchestrator stops compensating on the first compensation failure. Steps below the failed compensation remain in their executed state. This is the inconsistency that the Failed terminal state represents.

Test 4: Empty Steps -- Edge Case

[Fact]
public async Task ExecuteAsync_NoSteps_CompletesImmediately()
{
    // Arrange
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>>());
    var context = new OrderSagaContext();

    // Act
    var result = await orchestrator.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsSuccess);
    Assert.Equal(SagaState.Completed, context.State);
}

Test 5: First Step Fails -- No Compensation Needed

[Fact]
public async Task ExecuteAsync_FirstStepFails_NoCompensation()
{
    // Arrange
    var step1 = new TestStep(executeShouldSucceed: false);
    var step2 = new TestStep();
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>> { step1, step2 });
    var context = new OrderSagaContext();

    // Act
    var result = await orchestrator.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsFailure);
    Assert.Equal(SagaState.Compensated, context.State);

    // Step 1 executed (and failed)
    Assert.True(step1.ExecuteCalled);
    // Step 2 never executed
    Assert.False(step2.ExecuteCalled);

    // No compensation needed (no steps succeeded before step 1)
    Assert.False(step1.CompensateCalled);
    Assert.False(step2.CompensateCalled);
}

When the first step fails, fromIndex is -1 (because i - 1 = 0 - 1 = -1), and the compensation loop does not execute. The saga transitions to Compensated with no compensations run. This is correct: no steps succeeded, so there is nothing to undo.

Test 6: Cancellation Token

[Fact]
public async Task ExecuteAsync_CancellationRequested_Throws()
{
    // Arrange
    var step1 = new TestStep();
    var orchestrator = new SagaOrchestrator<OrderSagaContext>(
        new List<ISagaStep<OrderSagaContext>> { step1 });
    var context = new OrderSagaContext();
    var cts = new CancellationTokenSource();
    cts.Cancel();

    // Act & Assert
    // The behavior depends on whether the step checks the token.
    // The orchestrator passes ct to each step; steps should
    // throw OperationCanceledException when ct is cancelled.
    // This test verifies that the token is forwarded.
    await Assert.ThrowsAsync<OperationCanceledException>(
        () => orchestrator.ExecuteAsync(context, cts.Token));
}

Test 7: Persistence with InMemorySagaStore

[Fact]
public async Task InMemorySagaStore_FullLifecycle()
{
    // Arrange
    var store = new InMemorySagaStore();
    var sagaId = Guid.NewGuid();

    // Save
    var instance = new SagaInstance
    {
        Id = sagaId,
        SagaType = "OrderFulfillment",
        ContextJson = "{\"OrderId\":\"abc\"}",
        State = SagaState.Running,
        CurrentStepIndex = 0,
        CreatedAt = DateTimeOffset.UtcNow
    };
    await store.SaveAsync(instance);

    // FindPending (Running is not terminal)
    var pending = await store.FindPendingAsync();
    Assert.Single(pending);
    Assert.Equal(sagaId, pending[0].Id);

    // Update to Completed
    instance.State = SagaState.Completed;
    instance.CompletedAt = DateTimeOffset.UtcNow;
    await store.UpdateAsync(instance);

    // FindPending (Completed is terminal)
    pending = await store.FindPendingAsync();
    Assert.Empty(pending);

    // FindAsync
    var found = await store.FindAsync(sagaId);
    Assert.NotNull(found);
    Assert.Equal(SagaState.Completed, found!.State);

    // FindAsync for non-existent ID
    var notFound = await store.FindAsync(Guid.NewGuid());
    Assert.Null(notFound);
}

Testing Real Steps with Fake Services

The test double steps shown above are useful for testing the orchestrator itself. For testing real steps, you inject fake services:

[Fact]
public async Task ReserveInventoryStep_Success_SetsReservationId()
{
    // Arrange
    var fakeInventory = new FakeInventoryService(
        reserveResult: Result<Guid>.Success(Guid.NewGuid()));
    var step = new ReserveInventoryStep(fakeInventory);
    var context = new OrderSagaContext
    {
        Items = new[]
        {
            new OrderLineItem("SKU-001", 2),
            new OrderLineItem("SKU-002", 1)
        }
    };

    // Act
    var result = await step.ExecuteAsync(context);

    // Assert
    Assert.True(result.IsSuccess);
    Assert.NotNull(context.ReservationId);
}

[Fact]
public async Task ReserveInventoryStep_Compensate_WhenNoReservation_Succeeds()
{
    // Arrange
    var fakeInventory = new FakeInventoryService();
    var step = new ReserveInventoryStep(fakeInventory);
    var context = new OrderSagaContext
    {
        ReservationId = null // Step never executed
    };

    // Act
    var result = await step.CompensateAsync(context);

    // Assert
    Assert.True(result.IsSuccess);
    Assert.Equal(0, fakeInventory.ReleaseCallCount); // Not called
}

[Fact]
public async Task ChargePaymentStep_Compensate_AlreadyRefunded_Succeeds()
{
    // Arrange
    var fakeGateway = new FakePaymentGateway(
        getStatusResult: PaymentStatus.Refunded);
    var step = new ChargePaymentStep(fakeGateway);
    var context = new OrderSagaContext
    {
        PaymentId = Guid.NewGuid()
    };

    // Act
    var result = await step.CompensateAsync(context);

    // Assert
    Assert.True(result.IsSuccess);
    Assert.Equal(0, fakeGateway.RefundCallCount); // Idempotent: skip refund
}

These tests verify individual step behavior in isolation. The orchestrator tests verify the state machine. Together, they cover the full saga behavior.


DI Registration

Steps are registered as transient services using the [Injectable] attribute:

[Injectable(Lifetime.Transient)]
public sealed class ReserveInventoryStep : ISagaStep<OrderSagaContext>
{
    // ...
}

[Injectable(Lifetime.Transient)]
public sealed class ChargePaymentStep : ISagaStep<OrderSagaContext>
{
    // ...
}

[Injectable(Lifetime.Transient)]
public sealed class ScheduleShippingStep : ISagaStep<OrderSagaContext>
{
    // ...
}

The source generator emits registration code for each step. The orchestrator itself is assembled in the composition root because the step order matters and cannot be inferred by the DI container:

public static class SagaRegistration
{
    public static IServiceCollection AddOrderSaga(
        this IServiceCollection services)
    {
        // Steps are already registered by [Injectable] source generation.
        // The orchestrator is registered with explicit step ordering.
        services.AddTransient(sp =>
        {
            var steps = new List<ISagaStep<OrderSagaContext>>
            {
                sp.GetRequiredService<ReserveInventoryStep>(),
                sp.GetRequiredService<ChargePaymentStep>(),
                sp.GetRequiredService<ScheduleShippingStep>()
            };
            return new SagaOrchestrator<OrderSagaContext>(steps);
        });

        return services;
    }
}

The orchestrator is transient because it holds no state -- the state lives in the SagaContext instance, which is created per-operation. Multiple concurrent sagas each get their own context and their own orchestrator instance.

Why Not Register the Orchestrator Automatically?

The source generator could, in theory, scan for all ISagaStep<TContext> implementations and register an SagaOrchestrator<TContext> with all of them. But the step order matters, and there is no compile-time way to determine the correct order from the type metadata alone. Should ReserveInventoryStep run before or after ChargePaymentStep? The answer is domain knowledge, not type information.

By requiring explicit registration in the composition root, the step order is visible, auditable, and changeable. If you need to insert a new step between steps 2 and 3, you modify one line in the registration code.

Alternative: Factory Pattern

For sagas with conditional steps, a factory can build the orchestrator dynamically:

public sealed class OrderSagaFactory
{
    private readonly IServiceProvider _sp;

    public OrderSagaFactory(IServiceProvider sp) => _sp = sp;

    public SagaOrchestrator<OrderSagaContext> Create(
        OrderSagaContext context)
    {
        var steps = new List<ISagaStep<OrderSagaContext>>
        {
            _sp.GetRequiredService<ReserveInventoryStep>(),
            _sp.GetRequiredService<ChargePaymentStep>()
        };

        // Only add shipping step if the order has physical items
        if (context.Items.Any(i => !i.Sku.StartsWith("DIGITAL-")))
        {
            steps.Add(_sp.GetRequiredService<ScheduleShippingStep>());
        }

        // Add notification step for all orders
        steps.Add(_sp.GetRequiredService<SendConfirmationStep>());

        return new SagaOrchestrator<OrderSagaContext>(steps);
    }
}

The factory examines the context and builds a step list tailored to the specific order. Digital-only orders skip the shipping step. The orchestrator receives whatever steps the factory provides and runs them in order. The factory owns the domain logic of which steps to include. The orchestrator owns the execution logic.


Saga Recovery: Resuming After Crash

For long-running sagas that must survive application restarts, a background service monitors ISagaStore for non-terminal sagas and resumes them:

public sealed class SagaRecoveryService : BackgroundService
{
    private readonly ISagaStore _store;
    private readonly IServiceProvider _sp;
    private readonly ILogger<SagaRecoveryService> _logger;
    private readonly TimeSpan _pollInterval = TimeSpan.FromMinutes(1);

    public SagaRecoveryService(
        ISagaStore store, IServiceProvider sp,
        ILogger<SagaRecoveryService> logger)
    {
        _store = store;
        _sp = sp;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var pending = await _store.FindPendingAsync(ct);

            foreach (var instance in pending)
            {
                _logger.LogInformation(
                    "Recovering saga {SagaId} of type {SagaType} " +
                    "in state {State} at step {Step}",
                    instance.Id, instance.SagaType,
                    instance.State, instance.CurrentStepIndex);

                await RecoverAsync(instance, ct);
            }

            await Task.Delay(_pollInterval, ct);
        }
    }

    private async Task RecoverAsync(
        SagaInstance instance, CancellationToken ct)
    {
        // Resolve the orchestrator and context type based on SagaType.
        // Deserialize ContextJson to the concrete context type.
        // Re-execute or re-compensate based on instance.State.
        // Update the store with the new state.

        // Implementation depends on how you map SagaType strings
        // to concrete types. A dictionary-based registry works well:
        // Dictionary<string, Func<SagaInstance, Task>> _recoverers;
    }
}

The recovery service polls the store periodically for non-terminal sagas. When it finds one, it deserializes the context, resolves the orchestrator, and resumes execution. The exact recovery strategy depends on the saga state:

  • Running at step N: re-execute from step N forward. The step must be idempotent for this to be safe.
  • Compensating at step N: re-compensate from step N backward. Compensation must be idempotent (which we already require).

Idempotency is what makes crash recovery possible. Without idempotent steps, resuming a partially executed saga could double-charge, double-ship, or double-refund.


Pitfall 1: Non-Idempotent Compensation

The most dangerous pitfall. If CompensateAsync is not idempotent, crash recovery and retry logic will produce incorrect results. Always check the current state before acting:

// BAD: always refunds, even if already refunded
public async Task<Result.Result> CompensateAsync(
    OrderSagaContext context, CancellationToken ct)
{
    await _gateway.RefundAsync(context.PaymentId!.Value, ct);
    return Result.Result.Success();
}

// GOOD: checks before refunding
public async Task<Result.Result> CompensateAsync(
    OrderSagaContext context, CancellationToken ct)
{
    if (!context.PaymentId.HasValue)
        return Result.Result.Success();

    var status = await _gateway.GetStatusAsync(context.PaymentId.Value, ct);
    if (status == PaymentStatus.Refunded)
        return Result.Result.Success();

    await _gateway.RefundAsync(context.PaymentId.Value, ct);
    return Result.Result.Success();
}

Pitfall 2: Storing Non-Serializable Data in Context

If you use ISagaStore, the context must be serializable to JSON. Avoid putting services, HTTP clients, database connections, or entity objects with circular navigation properties in the context:

// BAD: IPaymentGateway is not serializable
public sealed class OrderSagaContext : SagaContext
{
    public IPaymentGateway Gateway { get; set; } // Will fail to serialize
}

// GOOD: only serializable data
public sealed class OrderSagaContext : SagaContext
{
    public Guid? PaymentId { get; set; }
    public decimal Amount { get; set; }
}

Services belong in step constructors (injected via DI), not in the context.

Pitfall 3: Steps With Side Effects That Cannot Be Compensated

Some actions are genuinely irreversible. You cannot unsend an email. You cannot undo a physical shipment that has already left the warehouse. You cannot retract a push notification.

Place irreversible steps last. If the email step fails, you want to compensate the payment and inventory, not the other way around. If the email step succeeds, there is nothing after it that can fail (because it is last), so compensation for the email step is never triggered.

// Correct ordering: reversible steps first, irreversible steps last
var steps = new List<ISagaStep<OrderSagaContext>>
{
    sp.GetRequiredService<ReserveInventoryStep>(),   // Reversible
    sp.GetRequiredService<ChargePaymentStep>(),      // Reversible
    sp.GetRequiredService<ScheduleShippingStep>(),   // Reversible
    sp.GetRequiredService<SendConfirmationStep>()    // Irreversible (last)
};

Pitfall 4: Fat Steps

A step should do one thing. If a step reserves inventory, charges a payment, and sends an email, its compensation needs to undo all three -- which is exactly the problem the saga pattern was supposed to solve. Break it into three steps:

// BAD: one step does three things
public async Task<Result.Result> ExecuteAsync(
    OrderSagaContext context, CancellationToken ct)
{
    await _inventory.ReserveAsync(context.Items, ct);
    await _payments.ChargeAsync(context.Amount, ct);
    await _email.SendAsync(context.CustomerEmail, ct);
    return Result.Result.Success();
}

// GOOD: three steps, each does one thing
// ReserveInventoryStep -> ChargePaymentStep -> SendConfirmationStep

Pitfall 5: Using Sagas for Local Operations

Not every multi-step operation needs a saga. If all steps operate on the same database, a local database transaction is simpler, faster, and provides stronger consistency guarantees (ACID) than a saga (which provides eventual consistency through compensation). Use sagas when steps cross system boundaries -- different databases, external APIs, third-party services. Use transactions when steps are local to one database.


Saga vs Other Approaches

Dimension Two-Phase Commit (2PC) Saga (Orchestration) Compensating Transaction Eventual Consistency
Atomicity All-or-nothing (ACID) Semantic undo via compensation Manual undo logic No atomicity guarantee
Isolation Locks held during prepare No isolation between steps No isolation No isolation
Participants Must implement XA/2PC protocol Any service with execute/compensate Any service Any service
Failure handling Coordinator rolls back Orchestrator compensates in reverse Manual rollback code Reconciliation jobs
Scalability Low (locks, coordinator bottleneck) High (no locks, no coordinator) Medium High
Complexity Low (if supported) Medium (step + compensate per action) Low (but error-prone) Low (but eventual)
Consistency model Strong (immediate) Eventual (compensated) Eventual (manual) Eventual
Cross-boundary Requires protocol support Works with any API Works with any API Works with any API
Crash recovery Coordinator recovery ISagaStore persistence Manual (no framework) Reconciliation
Testing Requires 2PC infrastructure InMemorySagaStore, test steps Manual mocking Reconciliation testing

When to Use 2PC

When all participants are databases that support the XA protocol, when the operation is short-lived (milliseconds), and when you need strong isolation (no dirty reads of intermediate state). Example: transferring funds between two accounts in the same database cluster.

When to Use Sagas

When participants are external services, when the operation is long-lived, when you cannot hold locks across participants, or when participants do not support distributed transaction protocols. Example: order fulfillment spanning inventory, payment, and shipping services.

When to Use Compensating Transactions

When you have a simple two-step operation and the overhead of a saga framework is not justified. The saga pattern is a formalization of compensating transactions -- it adds the state machine, the reverse-order guarantee, and the persistence layer. For one or two steps, a manual try-catch with a compensating call is fine. For three or more, use a saga.

When to Use Eventual Consistency

When compensation is not feasible or not necessary, and the system can tolerate temporary inconsistency. Example: updating a read model from an event stream. If the read model is temporarily stale, the user sees slightly outdated data, which is acceptable in many scenarios.


Pattern 1: Step Middleware

Just as the mediator pipeline wraps handlers with behaviors, you can wrap saga steps with middleware for cross-cutting concerns:

public sealed class LoggingStepDecorator<TContext> : ISagaStep<TContext>
    where TContext : SagaContext
{
    private readonly ISagaStep<TContext> _inner;
    private readonly ILogger _logger;

    public LoggingStepDecorator(ISagaStep<TContext> inner, ILogger logger)
    {
        _inner = inner;
        _logger = logger;
    }

    public async Task<Result.Result> ExecuteAsync(
        TContext context, CancellationToken ct)
    {
        var stepName = _inner.GetType().Name;
        _logger.LogInformation(
            "Saga {SagaId}: Executing {Step}", context.SagaId, stepName);

        var sw = Stopwatch.StartNew();
        var result = await _inner.ExecuteAsync(context, ct);
        sw.Stop();

        if (result.IsSuccess)
            _logger.LogInformation(
                "Saga {SagaId}: {Step} succeeded in {Elapsed}ms",
                context.SagaId, stepName, sw.ElapsedMilliseconds);
        else
            _logger.LogWarning(
                "Saga {SagaId}: {Step} failed in {Elapsed}ms: {Error}",
                context.SagaId, stepName, sw.ElapsedMilliseconds,
                result.Error);

        return result;
    }

    public async Task<Result.Result> CompensateAsync(
        TContext context, CancellationToken ct)
    {
        var stepName = _inner.GetType().Name;
        _logger.LogInformation(
            "Saga {SagaId}: Compensating {Step}", context.SagaId, stepName);

        var result = await _inner.CompensateAsync(context, ct);

        if (result.IsSuccess)
            _logger.LogInformation(
                "Saga {SagaId}: {Step} compensation succeeded",
                context.SagaId, stepName);
        else
            _logger.LogError(
                "Saga {SagaId}: {Step} compensation failed: {Error}",
                context.SagaId, stepName, result.Error);

        return result;
    }
}

Apply the decorator when building the step list:

var steps = new List<ISagaStep<OrderSagaContext>>
{
    new LoggingStepDecorator<OrderSagaContext>(
        sp.GetRequiredService<ReserveInventoryStep>(), logger),
    new LoggingStepDecorator<OrderSagaContext>(
        sp.GetRequiredService<ChargePaymentStep>(), logger),
    new LoggingStepDecorator<OrderSagaContext>(
        sp.GetRequiredService<ScheduleShippingStep>(), logger)
};

Every step now logs its execution and compensation with timing, without modifying any step code.

Pattern 2: Retry Decorator

For transient failures (network timeouts, rate limiting), a retry decorator wraps each step with retry logic:

public sealed class RetryStepDecorator<TContext> : ISagaStep<TContext>
    where TContext : SagaContext
{
    private readonly ISagaStep<TContext> _inner;
    private readonly int _maxRetries;
    private readonly TimeSpan _delay;

    public RetryStepDecorator(
        ISagaStep<TContext> inner, int maxRetries = 3,
        TimeSpan? delay = null)
    {
        _inner = inner;
        _maxRetries = maxRetries;
        _delay = delay ?? TimeSpan.FromMilliseconds(500);
    }

    public async Task<Result.Result> ExecuteAsync(
        TContext context, CancellationToken ct)
    {
        Result.Result result = Result.Result.Failure("Not executed");
        for (int attempt = 0; attempt <= _maxRetries; attempt++)
        {
            result = await _inner.ExecuteAsync(context, ct);
            if (result.IsSuccess)
                return result;

            if (attempt < _maxRetries)
                await Task.Delay(_delay, ct);
        }
        return result;
    }

    public Task<Result.Result> CompensateAsync(
        TContext context, CancellationToken ct)
    {
        // Compensation should be idempotent, so retrying is safe
        return _inner.CompensateAsync(context, ct);
    }
}

This is a simple fixed-delay retry. In production, you would use exponential backoff with jitter. The key observation is that retrying execution is safe when steps are idempotent, and retrying compensation is safe by the idempotency requirement we already established.

Pattern 3: Timeout Decorator

For steps that call external services with unpredictable latency:

public sealed class TimeoutStepDecorator<TContext> : ISagaStep<TContext>
    where TContext : SagaContext
{
    private readonly ISagaStep<TContext> _inner;
    private readonly TimeSpan _timeout;

    public TimeoutStepDecorator(ISagaStep<TContext> inner, TimeSpan timeout)
    {
        _inner = inner;
        _timeout = timeout;
    }

    public async Task<Result.Result> ExecuteAsync(
        TContext context, CancellationToken ct)
    {
        using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
        cts.CancelAfter(_timeout);

        try
        {
            return await _inner.ExecuteAsync(context, cts.Token);
        }
        catch (OperationCanceledException) when (!ct.IsCancellationRequested)
        {
            return Result.Result.Failure(
                $"Step {_inner.GetType().Name} timed out after {_timeout}");
        }
    }

    public Task<Result.Result> CompensateAsync(
        TContext context, CancellationToken ct)
    {
        return _inner.CompensateAsync(context, ct);
    }
}

Decorators compose. You can wrap a step in logging, then in retry, then in timeout:

ISagaStep<OrderSagaContext> step = new ChargePaymentStep(gateway);
step = new RetryStepDecorator<OrderSagaContext>(step, maxRetries: 3);
step = new TimeoutStepDecorator<OrderSagaContext>(step, TimeSpan.FromSeconds(30));
step = new LoggingStepDecorator<OrderSagaContext>(step, logger);

The outermost decorator runs first: logging wraps timeout wraps retry wraps the actual step.


Package Structure

The Saga pattern follows the same package structure as every other FrenchExDev pattern:

FrenchExDev.Net.Saga/
    ISagaStep.cs             -- ISagaStep<in TContext> interface
    SagaContext.cs           -- SagaContext abstract class
    SagaState.cs             -- SagaState enum (6 values)
    SagaOrchestrator.cs      -- SagaOrchestrator<TContext> sealed class
    SagaInstance.cs          -- SagaInstance sealed class (persistence entity)
    ISagaStore.cs            -- ISagaStore interface

FrenchExDev.Net.Saga.Testing/
    InMemorySagaStore.cs     -- InMemorySagaStore sealed class

Two packages. One for production. One for testing. The production package depends on FrenchExDev.Net.Result -- because ISagaStep.ExecuteAsync and CompensateAsync return Result. The testing package depends on the production package and nothing else.

Restored FrenchExDev.Net.Saga (1 dependency)
  └── FrenchExDev.Net.Result

One dependency. The shallow dependency graph holds.


Summary

The Saga pattern replaces distributed transactions with a sequence of local transactions, each paired with a compensating action that undoes its work on failure. The SagaOrchestrator<TContext> manages the state machine. ISagaStep<TContext> defines the execute-and-compensate contract. SagaContext carries the lifecycle state. SagaInstance and ISagaStore provide optional persistence for crash recovery. InMemorySagaStore provides a thread-safe test double.

Here is the complete API surface:

Type Purpose Cardinality
SagaContext Abstract base for saga state 1 abstract class, N derived
SagaState Lifecycle enum: Pending, Running, Completed, Compensating, Compensated, Failed 1 enum (6 values)
ISagaStep<TContext> Step contract with ExecuteAsync and CompensateAsync N per saga
SagaOrchestrator<TContext> Runs steps forward, compensates backward 1 per saga type
SagaInstance Persistence entity for saga state 1 sealed class
ISagaStore Persistence interface 1 interface, N implementations
InMemorySagaStore Test double (ConcurrentDictionary-backed) 1 (in .Testing)

The key design decisions are:

  1. Six-state machine distinguishes clean rollback (Compensated) from dirty failure (Failed), enabling operators to know whether manual intervention is needed.
  2. Reverse-order compensation ensures that dependent steps are undone in the correct order -- payment refunded before inventory released.
  3. Result integration makes success and failure explicit without exception-based control flow.
  4. Contravariant step interface enables generic cross-cutting steps that work with any context type.
  5. Persistence is optional -- short-lived sagas skip ISagaStore; long-lived sagas add it as a wrapper.
  6. Decorator composition enables logging, retry, and timeout without modifying step code.

Six types in the main package. One type in the testing package. One state machine. One orchestration engine. Idempotent compensation as the fundamental contract. And the entire order fulfillment scenario -- three steps, three compensations, persistence, recovery, and logging -- fits in fewer lines than the try-catch version that only handled two failure modes.

Next in the series: Part X: The Outbox Pattern, where we solve the dual-write problem with OutboxMessage and OutboxInterceptor, capture domain events from IHasDomainEvents entities within the same EF Core transaction, process them with IOutboxProcessor in a background service, and test everything with InMemoryOutbox.

⬇ Download