Skip to main content

Architecture Alignment Plan

Branch: fix/architecture-alignment (based on dev-v1.0.0) Status: living plan — update as items land. Source of truth: the DemozPay vision (payroll-powered financial infrastructure; orchestrator, not custodian — ADR-014) + the distributed-systems / vision-drift audits.

This is the single map for the architecture pass. Each workstream is independently shippable. We execute top-to-bottom by the recommended sequence, but any item can be pulled forward.

Guiding principles (from the vision)

  • DemozPay never holds funds — money lives at partner banks; we orchestrate + record (ADR-014).
  • Ledger is the sole money truth; balances derived, never stored (ADR-006).
  • Correctness > novelty. Boring, proven patterns. Incremental extraction over premature splits.
  • Every state change auditable + reconstructable (audit + outbox + state in one tx, ADR-008).
  • Build only what strengthens the financial rails; defer HR/scope creep.

Status legend

TODO · IN PROGRESS · DONE · DEFERRED


Done so far (this branch)

Audited 2026-06-19 — doc was stale. Commit cc53ad0 landed W1.3–W1.6, W2.1, W4, W5 but they were still marked TODO/NEXT below; corrected here + inline. Genuinely remaining: W7 (ADR-011 projection tables — not started), W2.2 (generalise recon beyond Equb), W3 (notifications wired but gated off — an enablement/deploy step). NOTE: file paths in this doc predate the apps/api/src bounded-context restructure now in the working tree (e.g. equb/*products/equb/*, resilience/_infra/resilience/, notification-*_infra/notification-consumers/, kyc/compliance/kyc/).

  • DONE Equb custody gate — EQUB_MONEY_MOVEMENT_ENABLED (default off); DisabledEqubLedgerAdapter seals contribution+payout at the single EQUB_LEDGER_PORT chokepoint. (W1.5: the hard prod-block has since been relaxed to a master kill-switch.)
  • DONE Wallet terminology clarified as a ledger read-model (no rename — auditability/ADR-009).
  • DONE Docs tell one story — README Equb status, CLAUDE.md Equb-gate fact, archive-doc positioning banner.
  • DONE (W1 step 1) bank-sandbox escrow account + demo cycle/binding; escrow-binding creation verifies via IdentityLookupPort (employer-link pattern).
  • DONE (W1 step 2) CORPORATE contribution leg → bank-to-bank transfer into escrow + PENDING ledger mirror + confirm/fail.
  • DONE (W1 step 3) Equb settlement-confirm — bank-settlement-applier.service.ts recognises domain: 'EQUB' (applyToEqub) and flips the equb:pool mirror PENDING→POSTED on the gateway webhook/poller.
  • DONE (W1 step 4) Equb payout leg — equb-escrow-payout.service.ts moves escrow → winner via the gateway; mirror posted on settle.
  • DONE (W1 step 5) Custody gate relaxed to "escrow required" enforcement; EQUB_MONEY_MOVEMENT_ENABLED kept as a master kill-switch (hard prod-block retired).
  • DONE (W1 step 6) PRIVATE Equb contribution on the escrow path — private-equb-contribution.service.ts requires an ACTIVE partner-bank escrow binding; member funds own-bank → escrow (no simulated wallet balance).
  • DONE (W2 step 1) BankSandboxPartnerBankStatementReader implemented; Equb escrow recon worker unblocked behind EQUB_ESCROW_RECON_ENABLED (operator-triggered via admin endpoint).
  • DONE (W4) Dead-letter surface — dead-letter service + admin controller + module (+ spec); retry-exhausted events land and can be requeued.
  • DONE (W5) Leader election — leader-election.service.ts, Postgres advisory-lock single-leader wrapper for non-idempotent schedulers.
  • DONE (W6) Circuit breakers + per-call deadlines on the ledger + integration-gateway gRPC clients (circuit-breaker.ts, 4 unit tests green).
  • DONE (W8) ADR-017 — Postgres outbox + table-poller is the event-transport spine; Kafka dormant; RabbitMQ not adopted.

Workstream 1 — Equb → partner-bank (bank-sandbox) escrow · Priority: HIGH

Why: proves the no-custody model end-to-end. Today the pot is a simulated internal ledger account; target is the pot living in a partner-bank escrow account (bank-sandbox in dev, real FI in prod), with the ledger as a mirror.

Current state (evidence):

  • Contribution posts member_clearing → equb:pool:{tenant}:{cycle} internal ledger account — apps/api/src/products/equb/equb-ledger.adapter.ts, packages/equb/backend/application/ports/ledger.port.ts.
  • Binding model exists: PartnerBankEscrowBinding (cycle → partner + partnerAccountId + ledgerPoolAccountId) — apps/api/prisma/schema.prisma.
  • bank-sandbox is a generic partner bank: POST /api/v1/transfers (hold-at-source → settle-to-dest → refund-on-fail), GET /api/v1/accounts/{n}services/bank-sandbox/internal/handler/handler.go, …/store/store.go.
  • The gateway disburse + settlement-confirm machinery already exists for EWA/payroll — apps/api/src/products/ewa/integration-gateway.grpc-client.ts, apps/api/src/money/integration/settlement-poller.service.ts, bank-settlement-applier.service.ts.

Target money flow (per-cycle escrow account, reused across rounds):

Contribution: member/employer acct ──gateway→bank-sandbox transfer──▶ ESCROW acct (held at bank)
Ledger: equb:pool mirror posted PENDING → POSTED on settlement confirm
Payout: ESCROW acct ──gateway→bank-sandbox transfer──▶ winner acct (POSTED on settle)
Reconcile: recon worker reads GET /accounts/{escrow} vs equb:pool mirror; bank wins

Approach (incremental steps):

  1. [DONE] Seed foundationFAKE-EQUB-ESCROW-001 in store.go:SeedDefaults; demo EqubCycle (DRAFT) + PartnerBankEscrowBinding in seed.ts. Binding creation (equb-escrow.controller.ts) now verifies the account at the partner via IdentityLookupPort before ACTIVE — same as employer bank-linking.
  2. [DONE] Contribution leg (CORPORATE)payroll-equb-fanout.service.ts requires an ACTIVE escrow binding, posts the equb:pool mirror PENDING, moves money employer-source → escrow via the EWA DisbursementPort, then confirms/fails the mirror on the bank's answer. EqubLedgerPort gained confirmContribution/failContribution; EqubContributionInput.status added.
  3. [DONE] Settlement confirm (async)bank-settlement-applier.service.ts applyToEqub flips the mirror PENDING → POSTED on the gateway webhook/poller; bank-webhook.controller.ts carries domain: 'EQUB'.
  4. [DONE] Payout legequb-escrow-payout.service.ts transfers escrow → winner; mirror posted on settle.
  5. [DONE] Gate → option B — contribution legs enforce "escrow required"; the blanket prod-block was retired, EQUB_MONEY_MOVEMENT_ENABLED kept as master kill-switch.
  6. [DONE] PRIVATE legprivate-equb-contribution.service.ts requires an ACTIVE escrow binding; member funds own-bank → escrow (no simulated wallet balance).

Risk: double-spend / partial-settlement at the escrow boundary — mitigate with the same idempotency keys + PENDING-until-confirmed pattern EWA uses. Verify under a non-superuser role (RLS).

Definition of done: a demo Equb cycle runs a full round (contributions in → payout out) entirely as bank-sandbox transfers; equb:pool mirror matches the escrow balance; recon green; no internal-pool custody path reachable.


Workstream 2 — Reconciliation tooling (ledger vs partner-bank) · Priority: HIGH

Why: the single biggest partner-DD / bank-trust unlock. "Show me your ledger reconciles to our statement."

Current state: [W2.1 DONE] BankSandboxPartnerBankStatementReader is implemented (was a stub); the Equb escrow recon worker is wired behind EQUB_ESCROW_RECON_ENABLED (operator-triggered via admin endpoint). [W2.2 REMAINING] still no general payout-vs-statement reconciliation for EWA/payroll/lending — only Equb escrow.

Approach:

  1. Implement PartnerBankStatementReader against bank-sandbox (GET /accounts/{n} + transfer history) — unblocks the Equb recon worker (turn on EQUB_ESCROW_RECON_ENABLED).
  2. Generalise: a reconciliation report per (partner account, date) comparing ledger mirror ↔ bank statement, bank-wins-on-drift, drift emits an ops event.
  3. Operator surface: an admin endpoint/report listing drift, already partially present (equb-escrow.controller.ts).

DoD: a daily recon run for the demo escrow account produces a zero-drift report; an injected discrepancy surfaces as a drift alert.


Workstream 3 — Notifications: wired → live · Priority: MEDIUM

Why: events fire but nothing reaches users. (Correction from earlier audit: not a bare stub — senders + consumer exist, it's gated off.)

Current state: notification-poller.service.ts, notification-dispatcher.service.ts, handlers, and SMS/email senders (logging/http/smtp/ethio-telecom) all exist; gated by NOTIFICATIONS_CONSUMER_ENABLED=false; SMS_PROVIDER=logger default.

Approach: pick the pilot provider (ethio-telecom SMS / SMTP email), wire credentials via existing env, sign off on copy, enable the consumer on one instance. Add per-event delivery logging. No new architecture — config + copy + enablement.

DoD: an EWA settlement event sends a real SMS/email in a pilot env; delivery logged.


Workstream 4 — DLQ + alerting on stuck outbox/poller events · Priority: MEDIUM

Why: failed events (FAILED rows, retry-exhausted) are invisible today — no DLQ, no alert.

Current state: [DONE] dead-letter surface shipped — dead-letter service + admin controller + module (+ spec). Retry-exhausted events land in it and can be requeued.

Approach: a dead-letter table (or status) for retry-exhausted events; an admin list + requeue action; a metric/alert on dead-letter count and on outbox lag (oldest unpublished age). Reuses the outbox/poller patterns; no broker needed.

DoD: a forced-failing event lands in the dead-letter surface, raises a metric, and can be requeued.


Workstream 5 — Leader election for pollers/schedulers · Priority: MEDIUM

Why: four pollers/workers (settlement-poller, notification-poller, payroll-deductions-poller, equb-escrow-reconciliation-worker) run on every pod. Money pollers are idempotent via FOR UPDATE SKIP LOCKED, but cron-style workers (auto-lock, court auto-submit, recon) can double-fire.

Current state: [DONE] leader-election.service.ts — Postgres advisory-lock (pg_try_advisory_xact_lock) single-leader wrapper with per-job LEADER_LOCK_KEYS, applied to the non-idempotent schedulers.

Approach: Postgres advisory-lock-based single-leader wrapper (no new infra) for the non-idempotent schedulers; idempotent SKIP-LOCKED pollers can stay multi-instance. Document which workers need the leader gate.

DoD: with 2 instances, a non-idempotent scheduled job runs once per tick.


Workstream 6 — Circuit breakers on ledger/gateway gRPC · Priority: MEDIUM

Why: a hung ledger or gateway currently blocks request threads with no fail-fast; partial-failure resilience is a vision principle.

Current state: [DONE] both gRPC clients are wrapped in a circuit breaker (circuit-breaker.ts, in _infra/resilience/) with per-call deadlines; 4 unit tests green.

Approach: wrap both clients in a breaker (timeout already present; add open/half-open on consecutive failures) so disburse fails fast and lands in PENDING for the poller to resolve — not a hung thread. Keep it boring (one small breaker util).

DoD: with the gateway down, disburse fails fast to PENDING and the settlement poller reconciles when it recovers.


Workstream 7 — ADR-011 cleanup (cross-domain DI → events) · Priority: LOW

Why: ADR-011 says cross-domain goes via events; ~18 DI edges exist, mostly synchronous reads.

Current state: the synchronous KYC/sanctions disburse gates are correct to stay synchronous (regulatory fail-closed). The drift is in cross-domain calculation reads (deductions, equb-behaviour-signal, income).

Approach: introduce event-fed projection tables for the calculation reads so domains read their own projection instead of importing another domain. No broker required (outbox-fed). Tighten ADR-011 wording to bless synchronous regulatory gates explicitly.

DoD: at least one calculation read (e.g. deductions) served from a projection, not a cross-domain adapter; ADR-011 amended.


Workstream 8 — Broker decision: formalize Option C · Priority: LOW (mostly a decision + ADR)

Why: ambiguity around Kafka/RabbitMQ. Truth: consumers poll the outbox table, so the Postgres outbox IS the messaging spine; Kafka is a dormant producer; RabbitMQ is absent.

Current state: kafka-event-publisher.ts built only when KAFKA_BROKERS set; relay gated by OUTBOX_PUBLISHER_ENABLED; consumers poll outbox_event.

Approach: write/append an ADR stating Option C — Postgres outbox + table-poller as the spine; Kafka stays dormant behind its flag as the future streaming on-ramp (activate only when a streaming consumer like Wallet/Risk exists); RabbitMQ not adopted. No code change beyond docs/flags.

DoD: ADR merged; README/CLAUDE reflect the single story.


Done: W1 (all steps) ✅ · W2.1 ✅ · W4 ✅ · W5 ✅ · W6 ✅ · W8 ✅.

Remaining (post-audit 2026-06-19), in order:

  1. W3 notifications — enable in a pilot env (provider creds + copy sign-off + flip NOTIFICATIONS_CONSUMER_ENABLED). Config/deploy, not code.
  2. W2.2 generalise reconciliation — extend payout-vs-statement recon beyond Equb to EWA/payroll/lending.
  3. W7 ADR-011 projections — event-fed projection tables for cross-domain calculation reads (deductions/income); amend ADR-011 to bless synchronous regulatory gates. Lowest urgency.

Verify the money paths (W1 escrow, W2 recon) under a non-superuser role before trusting them — the local superuser bypasses RLS (see TARGET_ARCHITECTURE_ALIGNMENT_PLAN.md step A4).