Architecture Alignment Plan
Branch: fix/architecture-alignment (based on dev-v1.0.0)
Status: living plan — update as items land.
Source of truth: the DemozPay vision (payroll-powered financial infrastructure; orchestrator, not custodian — ADR-014) + the distributed-systems / vision-drift audits.
This is the single map for the architecture pass. Each workstream is independently shippable. We execute top-to-bottom by the recommended sequence, but any item can be pulled forward.
Guiding principles (from the vision)
- DemozPay never holds funds — money lives at partner banks; we orchestrate + record (ADR-014).
- Ledger is the sole money truth; balances derived, never stored (ADR-006).
- Correctness > novelty. Boring, proven patterns. Incremental extraction over premature splits.
- Every state change auditable + reconstructable (audit + outbox + state in one tx, ADR-008).
- Build only what strengthens the financial rails; defer HR/scope creep.
Status legend
TODO · IN PROGRESS · DONE · DEFERRED
Done so far (this branch)
Audited 2026-06-19 — doc was stale. Commit
cc53ad0landed W1.3–W1.6, W2.1, W4, W5 but they were still marked TODO/NEXT below; corrected here + inline. Genuinely remaining: W7 (ADR-011 projection tables — not started), W2.2 (generalise recon beyond Equb), W3 (notifications wired but gated off — an enablement/deploy step). NOTE: file paths in this doc predate theapps/api/srcbounded-context restructure now in the working tree (e.g.equb/*→products/equb/*,resilience/→_infra/resilience/,notification-*→_infra/notification-consumers/,kyc/→compliance/kyc/).
- DONE Equb custody gate —
EQUB_MONEY_MOVEMENT_ENABLED(default off);DisabledEqubLedgerAdapterseals contribution+payout at the singleEQUB_LEDGER_PORTchokepoint. (W1.5: the hard prod-block has since been relaxed to a master kill-switch.) - DONE Wallet terminology clarified as a ledger read-model (no rename — auditability/ADR-009).
- DONE Docs tell one story — README Equb status, CLAUDE.md Equb-gate fact, archive-doc positioning banner.
- DONE (W1 step 1) bank-sandbox escrow account + demo cycle/binding; escrow-binding creation verifies via
IdentityLookupPort(employer-link pattern). - DONE (W1 step 2) CORPORATE contribution leg → bank-to-bank transfer into escrow + PENDING ledger mirror + confirm/fail.
- DONE (W1 step 3) Equb settlement-confirm —
bank-settlement-applier.service.tsrecognisesdomain: 'EQUB'(applyToEqub) and flips theequb:poolmirror PENDING→POSTED on the gateway webhook/poller. - DONE (W1 step 4) Equb payout leg —
equb-escrow-payout.service.tsmoves escrow → winner via the gateway; mirror posted on settle. - DONE (W1 step 5) Custody gate relaxed to "escrow required" enforcement;
EQUB_MONEY_MOVEMENT_ENABLEDkept as a master kill-switch (hard prod-block retired). - DONE (W1 step 6) PRIVATE Equb contribution on the escrow path —
private-equb-contribution.service.tsrequires an ACTIVE partner-bank escrow binding; member funds own-bank → escrow (no simulated wallet balance). - DONE (W2 step 1)
BankSandboxPartnerBankStatementReaderimplemented; Equb escrow recon worker unblocked behindEQUB_ESCROW_RECON_ENABLED(operator-triggered via admin endpoint). - DONE (W4) Dead-letter surface —
dead-letterservice + admin controller + module (+ spec); retry-exhausted events land and can be requeued. - DONE (W5) Leader election —
leader-election.service.ts, Postgres advisory-lock single-leader wrapper for non-idempotent schedulers. - DONE (W6) Circuit breakers + per-call deadlines on the ledger + integration-gateway gRPC clients (
circuit-breaker.ts, 4 unit tests green). - DONE (W8) ADR-017 — Postgres outbox + table-poller is the event-transport spine; Kafka dormant; RabbitMQ not adopted.
Workstream 1 — Equb → partner-bank (bank-sandbox) escrow · Priority: HIGH
Why: proves the no-custody model end-to-end. Today the pot is a simulated internal ledger account; target is the pot living in a partner-bank escrow account (bank-sandbox in dev, real FI in prod), with the ledger as a mirror.
Current state (evidence):
- Contribution posts
member_clearing → equb:pool:{tenant}:{cycle}internal ledger account —apps/api/src/products/equb/equb-ledger.adapter.ts,packages/equb/backend/application/ports/ledger.port.ts. - Binding model exists:
PartnerBankEscrowBinding(cycle → partner +partnerAccountId+ledgerPoolAccountId) —apps/api/prisma/schema.prisma. - bank-sandbox is a generic partner bank:
POST /api/v1/transfers(hold-at-source → settle-to-dest → refund-on-fail),GET /api/v1/accounts/{n}—services/bank-sandbox/internal/handler/handler.go,…/store/store.go. - The gateway disburse + settlement-confirm machinery already exists for EWA/payroll —
apps/api/src/products/ewa/integration-gateway.grpc-client.ts,apps/api/src/money/integration/settlement-poller.service.ts,bank-settlement-applier.service.ts.
Target money flow (per-cycle escrow account, reused across rounds):
Contribution: member/employer acct ──gateway→bank-sandbox transfer──▶ ESCROW acct (held at bank)
Ledger: equb:pool mirror posted PENDING → POSTED on settlement confirm
Payout: ESCROW acct ──gateway→bank-sandbox transfer──▶ winner acct (POSTED on settle)
Reconcile: recon worker reads GET /accounts/{escrow} vs equb:pool mirror; bank wins
Approach (incremental steps):
- [DONE] Seed foundation —
FAKE-EQUB-ESCROW-001instore.go:SeedDefaults; demoEqubCycle(DRAFT) +PartnerBankEscrowBindinginseed.ts. Binding creation (equb-escrow.controller.ts) now verifies the account at the partner viaIdentityLookupPortbefore ACTIVE — same as employer bank-linking. - [DONE] Contribution leg (CORPORATE) —
payroll-equb-fanout.service.tsrequires an ACTIVE escrow binding, posts theequb:poolmirror PENDING, moves money employer-source → escrow via the EWADisbursementPort, then confirms/fails the mirror on the bank's answer.EqubLedgerPortgainedconfirmContribution/failContribution;EqubContributionInput.statusadded. - [DONE] Settlement confirm (async) —
bank-settlement-applier.service.tsapplyToEqubflips the mirrorPENDING → POSTEDon the gateway webhook/poller;bank-webhook.controller.tscarriesdomain: 'EQUB'. - [DONE] Payout leg —
equb-escrow-payout.service.tstransfersescrow → winner; mirror posted on settle. - [DONE] Gate → option B — contribution legs enforce "escrow required"; the blanket prod-block was retired,
EQUB_MONEY_MOVEMENT_ENABLEDkept as master kill-switch. - [DONE] PRIVATE leg —
private-equb-contribution.service.tsrequires an ACTIVE escrow binding; member funds own-bank → escrow (no simulated wallet balance).
Risk: double-spend / partial-settlement at the escrow boundary — mitigate with the same idempotency keys + PENDING-until-confirmed pattern EWA uses. Verify under a non-superuser role (RLS).
Definition of done: a demo Equb cycle runs a full round (contributions in → payout out) entirely as bank-sandbox transfers; equb:pool mirror matches the escrow balance; recon green; no internal-pool custody path reachable.
Workstream 2 — Reconciliation tooling (ledger vs partner-bank) · Priority: HIGH
Why: the single biggest partner-DD / bank-trust unlock. "Show me your ledger reconciles to our statement."
Current state: [W2.1 DONE] BankSandboxPartnerBankStatementReader is implemented (was a stub); the Equb escrow recon worker is wired behind EQUB_ESCROW_RECON_ENABLED (operator-triggered via admin endpoint). [W2.2 REMAINING] still no general payout-vs-statement reconciliation for EWA/payroll/lending — only Equb escrow.
Approach:
- Implement
PartnerBankStatementReaderagainst bank-sandbox (GET /accounts/{n}+ transfer history) — unblocks the Equb recon worker (turn onEQUB_ESCROW_RECON_ENABLED). - Generalise: a reconciliation report per (partner account, date) comparing ledger mirror ↔ bank statement, bank-wins-on-drift, drift emits an ops event.
- Operator surface: an admin endpoint/report listing drift, already partially present (
equb-escrow.controller.ts).
DoD: a daily recon run for the demo escrow account produces a zero-drift report; an injected discrepancy surfaces as a drift alert.
Workstream 3 — Notifications: wired → live · Priority: MEDIUM
Why: events fire but nothing reaches users. (Correction from earlier audit: not a bare stub — senders + consumer exist, it's gated off.)
Current state: notification-poller.service.ts, notification-dispatcher.service.ts, handlers, and SMS/email senders (logging/http/smtp/ethio-telecom) all exist; gated by NOTIFICATIONS_CONSUMER_ENABLED=false; SMS_PROVIDER=logger default.
Approach: pick the pilot provider (ethio-telecom SMS / SMTP email), wire credentials via existing env, sign off on copy, enable the consumer on one instance. Add per-event delivery logging. No new architecture — config + copy + enablement.
DoD: an EWA settlement event sends a real SMS/email in a pilot env; delivery logged.
Workstream 4 — DLQ + alerting on stuck outbox/poller events · Priority: MEDIUM
Why: failed events (FAILED rows, retry-exhausted) are invisible today — no DLQ, no alert.
Current state: [DONE] dead-letter surface shipped — dead-letter service + admin controller + module (+ spec). Retry-exhausted events land in it and can be requeued.
Approach: a dead-letter table (or status) for retry-exhausted events; an admin list + requeue action; a metric/alert on dead-letter count and on outbox lag (oldest unpublished age). Reuses the outbox/poller patterns; no broker needed.
DoD: a forced-failing event lands in the dead-letter surface, raises a metric, and can be requeued.
Workstream 5 — Leader election for pollers/schedulers · Priority: MEDIUM
Why: four pollers/workers (settlement-poller, notification-poller, payroll-deductions-poller, equb-escrow-reconciliation-worker) run on every pod. Money pollers are idempotent via FOR UPDATE SKIP LOCKED, but cron-style workers (auto-lock, court auto-submit, recon) can double-fire.
Current state: [DONE] leader-election.service.ts — Postgres advisory-lock (pg_try_advisory_xact_lock) single-leader wrapper with per-job LEADER_LOCK_KEYS, applied to the non-idempotent schedulers.
Approach: Postgres advisory-lock-based single-leader wrapper (no new infra) for the non-idempotent schedulers; idempotent SKIP-LOCKED pollers can stay multi-instance. Document which workers need the leader gate.
DoD: with 2 instances, a non-idempotent scheduled job runs once per tick.
Workstream 6 — Circuit breakers on ledger/gateway gRPC · Priority: MEDIUM
Why: a hung ledger or gateway currently blocks request threads with no fail-fast; partial-failure resilience is a vision principle.
Current state: [DONE] both gRPC clients are wrapped in a circuit breaker (circuit-breaker.ts, in _infra/resilience/) with per-call deadlines; 4 unit tests green.
Approach: wrap both clients in a breaker (timeout already present; add open/half-open on consecutive failures) so disburse fails fast and lands in PENDING for the poller to resolve — not a hung thread. Keep it boring (one small breaker util).
DoD: with the gateway down, disburse fails fast to PENDING and the settlement poller reconciles when it recovers.
Workstream 7 — ADR-011 cleanup (cross-domain DI → events) · Priority: LOW
Why: ADR-011 says cross-domain goes via events; ~18 DI edges exist, mostly synchronous reads.
Current state: the synchronous KYC/sanctions disburse gates are correct to stay synchronous (regulatory fail-closed). The drift is in cross-domain calculation reads (deductions, equb-behaviour-signal, income).
Approach: introduce event-fed projection tables for the calculation reads so domains read their own projection instead of importing another domain. No broker required (outbox-fed). Tighten ADR-011 wording to bless synchronous regulatory gates explicitly.
DoD: at least one calculation read (e.g. deductions) served from a projection, not a cross-domain adapter; ADR-011 amended.
Workstream 8 — Broker decision: formalize Option C · Priority: LOW (mostly a decision + ADR)
Why: ambiguity around Kafka/RabbitMQ. Truth: consumers poll the outbox table, so the Postgres outbox IS the messaging spine; Kafka is a dormant producer; RabbitMQ is absent.
Current state: kafka-event-publisher.ts built only when KAFKA_BROKERS set; relay gated by OUTBOX_PUBLISHER_ENABLED; consumers poll outbox_event.
Approach: write/append an ADR stating Option C — Postgres outbox + table-poller as the spine; Kafka stays dormant behind its flag as the future streaming on-ramp (activate only when a streaming consumer like Wallet/Risk exists); RabbitMQ not adopted. No code change beyond docs/flags.
DoD: ADR merged; README/CLAUDE reflect the single story.
Recommended sequence
Done: W1 (all steps) ✅ · W2.1 ✅ · W4 ✅ · W5 ✅ · W6 ✅ · W8 ✅.
Remaining (post-audit 2026-06-19), in order:
- W3 notifications — enable in a pilot env (provider creds + copy sign-off + flip
NOTIFICATIONS_CONSUMER_ENABLED). Config/deploy, not code. - W2.2 generalise reconciliation — extend payout-vs-statement recon beyond Equb to EWA/payroll/lending.
- W7 ADR-011 projections — event-fed projection tables for cross-domain calculation reads (deductions/income); amend ADR-011 to bless synchronous regulatory gates. Lowest urgency.
Verify the money paths (W1 escrow, W2 recon) under a non-superuser role before trusting them — the local superuser bypasses RLS (see TARGET_ARCHITECTURE_ALIGNMENT_PLAN.md step A4).