ADR-021: Lightweight saga orchestration at MVP; Temporal deferred until justified
- Status: Accepted
- Date: 2026-06-18
- Deciders: Principal Architect, Engineering Lead
- Relates to: ADR-006, ADR-007, ADR-008, ADR-017, ADR-019
Context
Money flows cross service boundaries with separate databases (ADR-019): a payroll run posts to the ledger, then initiates a bank disbursement via the gateway, then reconciles. There is no single ACID transaction spanning those datastores, so partial failure must be handled by compensation (e.g. a failed disbursement triggers a ledger reversal). Temporal is the obvious durable-workflow engine for this, but it is a heavyweight operational dependency (its own cluster + DB) that a pre-launch team should not adopt before the workflows justify it.
Decision
At MVP/Post-MVP, orchestrate cross-service money flows with a lightweight, DB-backed pattern: a saga state row + the transactional outbox (ADR-008/017) + idempotent steps (ADR-007) + explicit compensating actions. Reconciliation against the partner bank (ADR-014) is the backstop of last resort.
Temporal (or an equivalent durable-workflow engine) is deferred and adopted only when workflow complexity or scale makes the hand-rolled saga state hard to reason about — many steps, long-running timers, complex ret/compensation trees.
Alternatives considered
- Adopt Temporal from day one — rejected: an extra stateful cluster to run/secure/recover before any validated need; over-engineering the MVP.
- No orchestrator — rely on event choreography alone — rejected for money: compensation logic scattered across consumers is hard to audit and reason about for correctness.
Consequences
- Positive: no new infra; built from primitives we already have (outbox, idempotency, ledger reversals); fully auditable in Postgres.
- Negative / accepted: the team owns the saga state machine + compensation code (a class of bugs Temporal would handle); long-running/timer-heavy workflows are awkward — that awkwardness is the signal to adopt Temporal.
- Follow-ups: introduce a
payments-orchestratorthat owns the payroll→ledger→disbursement saga + compensation; keep every step idempotent so retries/replays are safe.
Revisit when
- Workflows grow many steps / long timers / complex compensation trees, or stuck-saga incidents recur → adopt Temporal and migrate the orchestrator behind the same boundary.