Answer-first: Step-by-step Go code for Orchestrated Saga using Dapr Workflow: durable state, compensating transactions, and banking-grade consistency.

Most Go developers building microservices know the Choreography Saga pattern: service A emits an event, service B reacts, service C reacts to B, and so on. If step C fails, services emit “compensation” events in reverse order. The pattern works elegantly for simple flows, but breaks down as the number of steps grows: debugging a failed saga requires tracing events across five message broker topics, and implementing compensation logic requires every service to understand the full saga’s state.

Dapr Workflow offers a different model: Orchestrated Saga. A single orchestrator function owns the entire transaction lifecycle, calls each step explicitly, and manages compensation in a single, readable Go function. The orchestrator is durable — it survives process restarts without losing its position in the flow.

This post walks through a complete Go implementation of an Orchestrated Saga using Dapr Workflow, using a fund transfer across three financial microservices as the example.

For the broader financial microservices context, see Financial Microservices Architecture: Saga & Ledger and the Core Banking Developer Learning Path. The modern core banking architecture patterns, including event sourcing and ledger design, are covered in Part 4: Modern Core Banking Architecture.


Choreography vs. Orchestration: When Dapr Workflow Becomes the Better Choice

Both patterns implement the Saga distributed transaction model. The choice between them is architectural:

DimensionChoreography (Pub/Sub events)Orchestration (Dapr Workflow)
CouplingServices are decoupled from each otherServices coupled to orchestrator API
Flow visibilityDistributed across event logsCentralized in one orchestrator function
Compensation logicEach service implements its ownOrchestrator manages compensation in sequence
DebuggingRequires tracing across multiple topicsSingle workflow history log
Step count sweet spot2–4 steps4+ steps
Failure recoveryEvent replays, DLQ handling per serviceBuilt-in replay and durable state

When to choose Dapr Workflow Orchestration:

  • The saga has 4+ ordered steps that must execute in strict sequence
  • Compensation logic is complex and order-dependent
  • You need a human-readable, inspectable record of every transaction’s state
  • The business domain requires strong consistency guarantees (financial transactions, order fulfillment)

When Choreography is better:

  • Services genuinely need to be decoupled and independently deployable without knowledge of the overall flow
  • Steps can execute concurrently with no ordering dependency
  • The event stream itself provides the audit trail you need

Dapr Workflow Internals: How Durable Execution Works Under the Hood

Dapr Workflow is built on the Durable Task Framework. Understanding its execution model is critical to writing correct orchestrators.

When an orchestrator function executes, it does not run as a normal Go function. It runs as a replay-based state machine:

  1. First execution: The orchestrator runs step by step, scheduling each activity (external service call) as an async task.
  2. After a process restart or crash: When the workflow runtime restarts the orchestrator, it replays the entire history of recorded events from the backend state store. Completed activities are not re-executed — their recorded results are returned immediately.
  3. Determinism requirement: Because the orchestrator is replayed, it must be deterministic. Any non-deterministic operation (random numbers, time.Now(), reading environment variables) will produce different results on replay and corrupt the workflow state.
graph TD
    START[Workflow Started] --> REPLAY{First Run or Replay?}
    REPLAY -->|First Run| STEP1[Execute Activity: DebitSource]
    STEP1 --> PERSIST[Persist Event: DebitCompleted]
    PERSIST --> STEP2[Execute Activity: CreditTarget]
    STEP2 --> PERSIST2[Persist Event: CreditCompleted]
    PERSIST2 --> DONE[Workflow Complete]
    
    REPLAY -->|Replay| REPLAY1[Replay Event: DebitCompleted - instant]
    REPLAY1 --> REPLAY2[Replay Event: CreditCompleted - instant]
    REPLAY2 --> STEP3[Execute Next Pending Activity]

This means your orchestrator Go code must never call time.Now() directly. Use ctx.CurrentUTCDateTime() instead — Dapr Workflow provides this method to return the deterministic time recorded when the step first executed.


Setting Up Dapr Workflow in a Go Microservices Project

First, add the Dapr Go SDK:

go get github.com/dapr/go-sdk@v1.11.0

Your Go service needs the Dapr Workflow runtime initialized alongside the Dapr client:

package main

import (
    "context"
    "log"

    dapr "github.com/dapr/go-sdk/client"
    "github.com/dapr/go-sdk/workflow"
)

func main() {
    // Initialize Dapr workflow worker
    w, err := workflow.NewWorker()
    if err != nil {
        log.Fatalf("failed to create workflow worker: %v", err)
    }

    // Register the orchestrator and all activity functions
    if err := w.RegisterWorkflow(FundTransferWorkflow); err != nil {
        log.Fatalf("failed to register workflow: %v", err)
    }
    if err := w.RegisterActivity(DebitSourceAccount); err != nil {
        log.Fatalf("failed to register activity: %v", err)
    }
    if err := w.RegisterActivity(CreditTargetAccount); err != nil {
        log.Fatalf("failed to register activity: %v", err)
    }
    if err := w.RegisterActivity(RecordLedgerEntry); err != nil {
        log.Fatalf("failed to register activity: %v", err)
    }
    if err := w.RegisterActivity(CompensateDebitSourceAccount); err != nil {
        log.Fatalf("failed to register activity: %v", err)
    }

    // Start the worker (connects to Dapr sidecar)
    if err := w.Start(); err != nil {
        log.Fatalf("failed to start workflow worker: %v", err)
    }
    defer w.Shutdown()

    // ... start your HTTP/gRPC server
}

Step 1: Defining the Workflow Orchestrator Function

The orchestrator function is the heart of the Saga. It defines the execution order, handles errors, and triggers compensation.

// FundTransferInput carries the saga's input parameters
type FundTransferInput struct {
    TransactionID string  `json:"transaction_id"`
    SourceAccount string  `json:"source_account"`
    TargetAccount string  `json:"target_account"`
    Amount        float64 `json:"amount"`
    Currency      string  `json:"currency"`
}

// FundTransferWorkflow is the Saga orchestrator.
// It MUST be deterministic: no time.Now(), no rand, no env vars.
func FundTransferWorkflow(ctx *workflow.WorkflowContext) (any, error) {
    var input FundTransferInput
    if err := ctx.GetInput(&input); err != nil {
        return nil, fmt.Errorf("failed to get workflow input: %w", err)
    }

    // --- Step 1: Debit the source account ---
    var debitResult DebitResult
    if err := ctx.CallActivity(DebitSourceAccount, workflow.ActivityInput(input)).Await(&debitResult); err != nil {
        // Debit failed — no compensation needed (nothing was charged yet)
        return nil, fmt.Errorf("debit failed: %w", err)
    }

    // --- Step 2: Credit the target account ---
    var creditResult CreditResult
    if err := ctx.CallActivity(CreditTargetAccount, workflow.ActivityInput(input)).Await(&creditResult); err != nil {
        // Credit failed — must compensate the debit
        ctx.GetLogger().Warn("CreditTargetAccount failed, compensating debit", "error", err)
        
        compensateInput := CompensateInput{
            TransactionID: input.TransactionID,
            Account:       input.SourceAccount,
            Amount:        input.Amount,
            Reason:        fmt.Sprintf("credit_failed: %v", err),
        }
        var compensateResult CompensateResult
        if compErr := ctx.CallActivity(CompensateDebitSourceAccount, 
            workflow.ActivityInput(compensateInput)).Await(&compensateResult); compErr != nil {
            // Compensation itself failed — this is a critical alert scenario
            return nil, fmt.Errorf("CRITICAL: compensation failed after credit failure: debit_err=%v, comp_err=%w", err, compErr)
        }
        return nil, fmt.Errorf("transfer failed and compensated: %w", err)
    }

    // --- Step 3: Record the ledger entry (audit trail) ---
    ledgerInput := LedgerInput{
        TransactionID: input.TransactionID,
        DebitResult:   debitResult,
        CreditResult:  creditResult,
        Timestamp:     ctx.CurrentUTCDateTime(), // deterministic time
    }
    var ledgerResult LedgerResult
    if err := ctx.CallActivity(RecordLedgerEntry, workflow.ActivityInput(ledgerInput)).Await(&ledgerResult); err != nil {
        // Ledger recording failed — this is a consistency issue but money moved correctly.
        // Trigger a reconciliation alert rather than compensating the transfer.
        ctx.GetLogger().Error("LedgerEntry failed after successful transfer — reconciliation required",
            "transaction_id", input.TransactionID,
            "error", err)
        // Return partial success to indicate the transfer completed but audit trail needs repair
    }

    return &FundTransferResult{
        TransactionID:  input.TransactionID,
        Status:         "completed",
        LedgerRecorded: err == nil,
    }, nil
}

Step 2: Implementing Activity Functions (the Individual Saga Steps)

Each activity is an isolated, idempotent function that performs a single step. Activities can have side effects (database writes, API calls) — unlike the orchestrator, they do not need to be deterministic.

// DebitSourceAccount atomically debits the source account.
// This function MUST be idempotent: if called twice with the same TransactionID, 
// the second call must be a no-op (not a double debit).
func DebitSourceAccount(ctx context.Context, input FundTransferInput) (DebitResult, error) {
    db := dbFromContext(ctx) // retrieve GORM *gorm.DB from context

    var result DebitResult
    err := db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
        // Idempotency check: has this transaction already been processed?
        var existing LedgerTransaction
        if err := tx.Where("external_id = ? AND type = ?", 
            input.TransactionID, "debit").First(&existing).Error; err == nil {
            // Already processed — return the cached result
            result = DebitResult{
                DebitID:   existing.ID,
                NewBalance: existing.PostBalance,
                Idempotent: true,
            }
            return nil
        } else if !errors.Is(err, gorm.ErrRecordNotFound) {
            return fmt.Errorf("idempotency check failed: %w", err)
        }

        // Lock the account row and check balance
        var account Account
        if err := tx.Set("gorm:query_option", "FOR UPDATE").
            Where("account_number = ?", input.SourceAccount).
            First(&account).Error; err != nil {
            return fmt.Errorf("account not found: %w", err)
        }
        if account.Balance < input.Amount {
            return ErrInsufficientFunds
        }

        // Debit
        newBalance := account.Balance - input.Amount
        if err := tx.Model(&account).Update("balance", newBalance).Error; err != nil {
            return fmt.Errorf("balance update failed: %w", err)
        }

        // Record the ledger entry
        ledger := LedgerTransaction{
            ExternalID:  input.TransactionID,
            Type:        "debit",
            AccountNo:   input.SourceAccount,
            Amount:      input.Amount,
            PostBalance: newBalance,
            CreatedAt:   time.Now(),
        }
        if err := tx.Create(&ledger).Error; err != nil {
            return fmt.Errorf("ledger insert failed: %w", err)
        }

        result = DebitResult{DebitID: ledger.ID, NewBalance: newBalance}
        return nil
    })

    return result, err
}

The key pattern here is idempotency via ExternalID: before executing the debit, the function checks whether a ledger entry with the same TransactionID already exists. If it does, it returns the previously computed result without repeating the operation. This is critical because Dapr Workflow may retry activity calls on transient failures, and without idempotency a retry would produce a double debit.

This idempotency pattern, combined with the Go transaction handling demonstrated here, mirrors the patterns described in MySQL Database Scaling: Vitess & GORM Sharding for high-throughput financial writes.


Step 3: Designing and Triggering Compensating Transactions on Failure

The compensation for a debit is a credit back to the source account — but it must be recorded as a reversal, not simply deleted, to maintain audit trail integrity.

func CompensateDebitSourceAccount(ctx context.Context, input CompensateInput) (CompensateResult, error) {
    db := dbFromContext(ctx)

    var result CompensateResult
    err := db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
        // Idempotency check for the compensation itself
        var existing LedgerTransaction
        compensateID := input.TransactionID + ":compensate"
        if err := tx.Where("external_id = ? AND type = ?", 
            compensateID, "compensation").First(&existing).Error; err == nil {
            result = CompensateResult{CompensationID: existing.ID, Idempotent: true}
            return nil
        }

        // Find the original debit entry
        var originalDebit LedgerTransaction
        if err := tx.Where("external_id = ? AND type = ?", 
            input.TransactionID, "debit").First(&originalDebit).Error; err != nil {
            return fmt.Errorf("original debit not found for compensation: %w", err)
        }

        // Reverse the balance change
        var account Account
        if err := tx.Set("gorm:query_option", "FOR UPDATE").
            Where("account_number = ?", input.Account).
            First(&account).Error; err != nil {
            return fmt.Errorf("account not found during compensation: %w", err)
        }

        newBalance := account.Balance + input.Amount
        if err := tx.Model(&account).Update("balance", newBalance).Error; err != nil {
            return fmt.Errorf("compensation balance update failed: %w", err)
        }

        // Record the compensation as an explicit ledger entry
        compensation := LedgerTransaction{
            ExternalID:       compensateID,
            Type:             "compensation",
            AccountNo:        input.Account,
            Amount:           input.Amount,
            PostBalance:      newBalance,
            LinkedExternalID: input.TransactionID,
            Reason:           input.Reason,
            CreatedAt:        time.Now(),
        }
        if err := tx.Create(&compensation).Error; err != nil {
            return fmt.Errorf("compensation ledger insert failed: %w", err)
        }

        result = CompensateResult{CompensationID: compensation.ID}
        return nil
    })
    return result, err
}

Step 4: Observing Saga State — Querying Workflow Status and History

One of Dapr Workflow’s strongest features is built-in state visibility. Every workflow instance stores its full execution history in the configured backend (Redis or a SQL database).

// Query workflow status from any service using the Dapr client
func GetTransactionStatus(ctx context.Context, transactionID string) (*WorkflowStatus, error) {
    client, err := dapr.NewClient()
    if err != nil {
        return nil, err
    }
    defer client.Close()

    resp, err := client.GetWorkflowBeta1(ctx, &dapr.GetWorkflowRequest{
        InstanceID:        transactionID,
        WorkflowComponent: "dapr",
    })
    if err != nil {
        return nil, fmt.Errorf("failed to get workflow state: %w", err)
    }

    return &WorkflowStatus{
        InstanceID:     resp.InstanceID,
        RuntimeStatus:  resp.RuntimeStatus,    // RUNNING, COMPLETED, FAILED
        CreatedAt:      resp.CreatedAt,
        LastUpdated:    resp.LastUpdatedAt,
        FailureDetails: resp.FailureDetails,
    }, nil
}

This endpoint can power a real-time transaction status API for your banking application — consumers (mobile apps, ops dashboards) can poll or subscribe to transaction state without needing access to internal message queues.


Production Patterns: Idempotency, Timeouts, and DLQ for Dapr Workflows

Activity Timeouts

Every activity should have a timeout to prevent the orchestrator from waiting indefinitely for a hung downstream service:

// Requires: github.com/dapr/go-sdk v1.9+ (workflow.ActivityOptions.RetryPolicy added in v1.9)
// import "github.com/dapr/go-sdk/workflow"

// Set a 30-second timeout per activity call
opts := workflow.ActivityOptions{
    StartToCloseTimeout: 30 * time.Second,
    RetryPolicy: &workflow.RetryPolicy{
        MaxAttempts:        3,
        InitialInterval:    time.Second,
        BackoffCoefficient: 2.0,
        MaxInterval:        10 * time.Second,
    },
}
if err := ctx.CallActivity(CreditTargetAccount, 
    workflow.ActivityInput(input), 
    workflow.WithActivityOptions(opts)).Await(&creditResult); err != nil {
    // Handle timeout or retry exhaustion
}

Workflow-Level Timeout

Set a maximum total duration for the entire saga:

// Start workflow with a 5-minute total timeout
resp, err := client.StartWorkflowBeta1(ctx, &dapr.StartWorkflowRequest{
    WorkflowComponent: "dapr",
    WorkflowName:      "FundTransferWorkflow",
    InstanceID:        transferRequest.TransactionID,
    Options: map[string]string{
        "workflow-timeout": "300s", // 5 minutes
    },
    Input: inputBytes,
})

Dead Letter Handling

Workflows that fail after all retries are exhausted transition to FAILED state with a FailureDetails payload. Build a reconciliation worker that queries for failed workflow instances and routes them to a DLQ or a human review queue:

func ReconcilationWorker(ctx context.Context, client dapr.Client) {
    ticker := time.NewTicker(1 * time.Minute)
    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            // List workflows in FAILED state (implementation depends on your state backend)
            failedInstances := queryFailedWorkflows(ctx)
            for _, instance := range failedInstances {
                routeToDLQ(ctx, instance)
                alertOpsTeam(ctx, instance)
            }
        }
    }
}

Real-World Example: A Fund Transfer Saga Across 3 Financial Microservices

Putting it all together, the complete fund transfer saga flow:

sequenceDiagram
    participant API as Transfer API
    participant DAPR as Dapr Workflow
    participant DEBIT as Account Service (Debit)
    participant CREDIT as Account Service (Credit)
    participant LEDGER as Ledger Service

    API->>DAPR: StartWorkflow(FundTransferWorkflow)
    DAPR->>DEBIT: Activity: DebitSourceAccount
    DEBIT-->>DAPR: DebitResult (newBalance, debitID)
    
    DAPR->>CREDIT: Activity: CreditTargetAccount
    alt Credit Succeeds
        CREDIT-->>DAPR: CreditResult
        DAPR->>LEDGER: Activity: RecordLedgerEntry
        LEDGER-->>DAPR: LedgerResult
        DAPR-->>API: Workflow COMPLETED
    else Credit Fails
        CREDIT-->>DAPR: Error
        DAPR->>DEBIT: Activity: CompensateDebitSourceAccount
        DEBIT-->>DAPR: CompensationResult
        DAPR-->>API: Workflow FAILED (compensated)
    end

This flow is inspectable at every step via the Dapr dashboard or the workflow status API. Every intermediate state is persisted. If the Dapr sidecar restarts mid-transfer, the workflow replays from the last completed checkpoint without re-executing completed activities.

For teams combining Dapr Workflow with the full Dapr event-driven ecosystem (Pub/Sub, State, Bindings), see Mastering Event-Driven Architecture with Dapr for the broader integration patterns.


Frequently Asked Questions

What is the difference between Dapr Workflow and Dapr Pub/Sub for Sagas?

Dapr Pub/Sub implements choreography-based Sagas: services react to events independently, with no central coordinator. Dapr Workflow implements orchestration-based Sagas: a single orchestrator function explicitly calls each step and manages compensation. Workflow provides better observability (centralized history log), simpler compensation logic, and durable state — but introduces a coordinator as a new dependency.

How does Dapr Workflow handle failures and retries?

Dapr Workflow supports per-activity retry policies with configurable max attempts, initial interval, backoff coefficient, and max interval. Activities that fail are retried according to their policy before the orchestrator receives the error. Activity functions must be idempotent to handle retries safely — using an external transaction ID as an idempotency key in the database is the standard pattern.

Is Dapr Workflow production-ready?

Dapr Workflow (based on the Durable Task Framework) reached stable status in Dapr v1.12 (mid-2024). The Go SDK has stable workflow APIs from v1.11. Production considerations: choose a reliable backend (Redis Cluster or PostgreSQL via the dapr-workflow-backend component) for workflow state storage, and monitor the Dapr sidecar resource consumption under high workflow throughput.

For the observability layer on top of these workflows — how to propagate W3C trace context through Kafka headers, configure tail-based sampling, and redact PII at the OTel Collector — see Go Microservices Distributed Tracing Architecture.


🤝 Let's Connect

Are you facing similar challenges with system architecture, scaling, or migration? I'd love to hear about it. Connect with me on LinkedIn, check out my GitHub, or drop me an email.