DemozPay — Bank Orchestration Architecture
Snapshot: 2026-05-29 Companion to:
REAL_SYSTEM_STATE.md,MONEY_FLOWS.md,RECONCILIATION_ARCHITECTURE.md. Purpose: the canonical statement of how money moves through DemozPay. Read before designing any new flow.
The one-paragraph thesis
DemozPay never holds customer money. Funds at rest live in partner bank accounts (employer payroll account, FI lending pool, BNPL underwriter account). DemozPay's job is to instruct partner banks to transfer money between accounts on a customer's behalf, record the obligation lifecycle in the ledger, reconcile that record against the partner bank's own statement, and alert when reality and record disagree. The platform is an orchestrator + ledger + reconciliation system. It is not a wallet, not a custodian, and not a bank.
This single fact governs every architectural decision below. When in doubt, ask: "does this change make us look like a custodian?" If yes, it's wrong.
§1. Custody boundary — what we are and what we are not
| DemozPay | Partner bank (Dashen, CBE, Telebirr, etc.) | FI funder | |
|---|---|---|---|
| Holds customer money | NO | YES | YES (lending pool) |
| Authoritative balance | NO | YES | YES (per FI per pool account) |
| Initiates transfers | YES (via partner API) | NO (executes) | NO (funds the pool) |
| Records obligations + journal | YES (the ledger) | NO | NO |
| Reconciles against statements | YES | (publishes statements) | (may reconcile their side) |
| Bears settlement-failure risk | NO (passes through) | YES (partner-rejected) | (depends on contract) |
| Bears default risk (loan, BNPL) | NO | NO | YES (FI underwrites) |
Implication: DemozPay's ledger entries record obligations (receivable from employee, payable to FI partner, payroll clearing), never custody positions (cash, wallet balance). Anything that looks like a cash account is a bug. ADR-006 enforces this; ADR-014 (planned) makes it a written commitment.
§2. The four canonical money paths
Every dollar (santim) that flows through the platform follows one of these four paths. Detailed flows in MONEY_FLOWS.md; this section is the contract.
Path A: Employer → Employee (EWA)
- Source: employer's pre-funded payroll account at partner bank.
- Destination: employee's bank/wallet account.
- DemozPay role: instruct the transfer, pre-commit
DR receivable-from-employee / CR payable-to-business-bank, settle on bank confirmation, recover via payroll deduction at next pay-cycle. - Risk owner: Employer. If employee leaves before payroll deduction, employer eats the loss (per employer service agreement).
- Status: LIVE end-to-end against
bank-sandbox. Repayment loop is PARTIAL (admin endpoint only).
Path B: FI partner → Employee (Lending)
- Source: FI partner's pool account at FI's bank.
- Destination: employee's bank/wallet account.
- DemozPay role: instruct the transfer, pre-commit
DR receivable-from-borrower / CR payable-to-fi-partner, settle on confirmation, recover via payroll over N installments, remit each installment to FI. - Risk owner: FI partner. They underwrite the loan; their capital is at risk. We are servicer + collector + reporter.
- Status: disburse + record-repayment LIVE; remit-to-FI loop is NOT implemented. After
RecordRepaymentUseCasedebits PayrollClearing and credits ReceivableFromBorrower, the corresponding transfer from PayrollClearing → FI's pool account does not happen. PayrollClearing accumulates as a phantom asset.
Path C: BNPL underwriter → Merchant (BNPL settlement)
- Source: BNPL underwriter's settlement account at partner bank.
- Destination: merchant's bank account.
- DemozPay role: at purchase time, pre-commit
DR receivable-from-employee / CR payable-to-merchant; on T+1 settlement, instruct the transfer; recover via payroll like lending. - Risk owner: BNPL underwriter (a category of FI partner).
- Status: PLANNED. No code. The legacy
BNPLPurchasetable exists but is architecturally invalid (see REAL_SYSTEM_STATE §1).
Path D: Employee → Employer (Payroll-cycle reconciliation)
- Source: employer's payroll account.
- Destination: logical only — no transfer happens. Employer's payroll-run engine deducts the EWA + loan installment amounts BEFORE depositing the employee's net pay, so DemozPay's
receivable from employeeis satisfied by the non-payment of those amounts in the next payroll deposit. - DemozPay role: consume
payroll.deductions_taken.v1event from the payroll engine, callRecordRepaymentUseCase(lending) andRecordEwaRepaymentUseCase(both exist), zero the receivable, transfer PayrollClearing → FI/Employer/Underwriter as appropriate. - Risk owner: N/A — internal reconciliation step.
- Status: PARTIAL. The payroll engine is Live (
@demoz-pay/payroll) and emitspayroll.deductions_taken.v1; the repayment use cases exist. The gap is the cross-domain consumer (PayrollConsumersModule), which is currently disabled pending a DI fix — so deductions are emitted but not yet auto-applied to EWA/loan repayment. See alignment-plan step A2.
§3. Settlement timing — async is the default, sync is the bug
A bank's API returns one of two answers when you initiate a transfer:
- ACCEPTED: "We received your instruction. We will tell you later if it settled or failed." Returns within 1–5 seconds. This is not settlement. Money has not yet moved at the bank's correspondent leg.
- REJECTED: "We refuse this instruction" — invalid account, insufficient funds in source, daily-limit exceeded, compliance hold, etc. Synchronous.
Final settlement arrives asynchronously, minutes to hours later, via:
- A webhook (preferred — Dashen and Telebirr both support).
- A status query when we poll (
GetDisbursementStatus). - A statement line in the next daily bank statement (the slowest channel — used for reconciliation, not for live updates).
Architectural consequence: the ledger MUST model the in-flight state. ADR-012 added a status column on ledger_transaction (PENDING / POSTED / REVERSED / FAILED) for exactly this reason. A pre-commit at PENDING is forbidden to influence balances until a corresponding ConfirmSettlement flips it to POSTED.
A platform that assumes synchronous settlement will:
- Tell the employee their money has arrived before it has.
- Show inflated balance to the employer.
- Settle a "successful" disbursement that the bank silently rejected 30 minutes later, then never reconcile.
DemozPay does not assume synchronous settlement. GAP-08 + GAP-09 + GAP-11 close the loop. Verify quarterly.
§4. The partner adapter contract
Every partner bank integration is a Go package under services/integration-gateway/internal/adapters/<partner>/ that implements PartnerAdapter:
type PartnerAdapter interface {
// Synchronous "we received your instruction" handshake.
// Returns InitiateResponse with partner_reference and SYNCHRONOUS status
// (ACCEPTED or REJECTED). Must NOT block waiting for settlement.
Initiate(ctx context.Context, req InitiateRequest) (InitiateResponse, error)
// Status query — read-side; safe to call as often as our poller wants.
// Returns the partner's current understanding of the transfer's lifecycle.
GetStatus(ctx context.Context, partnerRef string) (StatusResponse, error)
// Webhook payload parser. Returns the normalised event shape
// (COMPLETED / FAILED / REVERSED). Must verify the signature here —
// ParseWebhook is the security boundary.
ParseWebhook(headers http.Header, body []byte) (WebhookEvent, error)
Name() string // partner identifier; used in routing + logs.
}
Three implementations exist today:
internal/adapters/dashen/— production-shape, HMAC-SHA256 signing, real HTTP client. Talks toservices/bank-sandbox/in dev/test; will talk to Dashen's sandbox or production on a config swap.internal/adapters/mock/— always returns ACCEPTED + settles viaforce-settleadmin. Used in tests.- (planned)
cbe/,awash/,telebirr/,mbirr/.
A 4th partner adapter is at minimum two weeks of work: API spec ingest, signing primitive, response normalisation, webhook parser, contract tests, sandbox onboarding. Plan accordingly.
What the adapter contract does NOT cover (and must)
LookupAccount(account_id)— proto exists, server returnsUnimplemented. This is the production-critical gap. Every disburse should callLookupAccountfirst to verify destination is the named account. Today we trust the request.- Balance check at source — for FI funding, we should verify the pool has the funds before submitting. No primitive.
- Adapter health surfacing —
GetAdapterStatusalways returns HEALTHY. We don't track recent error rates or latency per adapter. - Circuit breaker — if Dashen's API is down, we should stop submitting and surface the outage. Today we'll cheerfully submit failures until the rate limiter kicks us.
§5. Failure modes — what each looks like and what we do
| Failure | Symptom | Current behaviour | Required behaviour |
|---|---|---|---|
| Partner-side reject at submit (insufficient funds, invalid account) | InitiateResponse.status = REJECTED | DisburseEwaUseCase calls MarkSettlementFailed + Reverse. EWA goes BANK_REJECTED. ✓ | |
| Partner-side fail-after-accept (async failure: account closed, AML hold) | Webhook with event=FAILED OR status poll returns FAILED | BankSettlementApplier calls MarkSettlementFailed + Reverse. ✓ | Same. |
| Webhook never arrives | Bank settled, we don't know | Poller (30s tick) catches it via GetStatus. ✓ | Add stale-pending alert at 24h threshold (PLANNED). |
| Both webhook + poll silent for 24h | Bank's internal queue is stuck | We do nothing. No alert. | GAP — escalation needed. |
| Webhook arrives with bad signature | HMAC verify fails | Log + 401. ✓ Counter increments. | GAP — if rate > 5%/5min, page on-call. |
| Webhook arrives but disbursement has no matching aggregate | findByProviderRef returns null | Logged as unmatched. Counter increments. No alert. | GAP — needs operator workflow. |
| Settlement webhook arrives twice for same partner_ref | Idempotency replay | BankSettlementApplier.bumpApply(replay) — no-op. ✓ | Same. |
| Partner API timeout mid-submit | Initiate errors. Ledger still PENDING; no partner_reference recorded | DisburseEwaUseCase catches; EWA stays APPROVED (NOT submitted). Caller can retry. ✓ | Add retry-with-backoff on the gateway side (PARTIAL — currently caller retries). |
| Network partition between API and gateway | gRPC connection refused | Use case fails; EWA stays APPROVED. ✓ | Add circuit breaker so we don't repeatedly hammer a down gateway. |
| Partial settlement (bank accepts ETB 1500 but only 1000 settles — fees deducted at correspondent) | Webhook reports COMPLETED for 1000 | We POST 1500 to the ledger; reconciliation drift catches the 500 santim. | GAP — needs "partial settlement" event handler. Today we'd POST the wrong amount. |
Ledger DB outage mid-ConfirmSettlement | RPC errors | BankSettlementApplier logs error; will retry on next poller tick OR next webhook delivery. ✓ | Add stale-ack metric. |
| Gateway DB outage | Gateway crashes at boot or mid-RPC | runbooks/gateway-down.md covers this. ✓ | Same. |
The matrix above is the operational reality check. Anywhere it says "GAP", that's a real incident path.
§6. Idempotency and replay safety — the four layers
Idempotency is enforced at four independent layers:
- API entry —
Idempotency-Keyheader on every money-moving POST.PrismaIdempotencyStorestores(tenantId, scope, key, fingerprint, result). Conflicts → 409. (ADR-007) - Ledger —
(tenant_id, idempotency_key)UNIQUE onledger_transaction. Same key + same fingerprint → cached row; different fingerprint → FailedPrecondition. - Gateway —
(tenant_id, idempotency_key)UNIQUE ondisbursement. Same shape as ledger. - Reconciliation store —
(tenant_id, partner, partner_reference)UNIQUE onbank_statement_line. A re-ingested statement file is safe.
These layers are belts AND suspenders. The API key may not be the same string as the ledger key, which may not be the same as the gateway key. Each layer's idempotency keeps that layer's table consistent on retry.
TTL gap: none of these tables have a delete policy. Layer 1 (IdempotencyRecord) will reach hundreds of millions of rows by month 12. Plan a TTL job before go-live.
§7. Reversals — the only valid form of correction
ADR-009 forbids DELETE and UPDATE on financial rows. Every correction is a forward-motion entry:
- A pre-commit PENDING transaction that fails at submit →
Reverse()adds a compensating transaction. Both rows visible forever. - A POSTED transaction that turned out to be wrong (post-settlement chargeback, dispute resolution) →
Reverse(). The original stays POSTED for audit.
Architectural consequence: the ledger gets larger forever. No automatic prune. Plan for archival once tables exceed ~100M rows (years out — not a near-term concern).
Anti-pattern alert: never call Reverse() to "fix a bug". A bug-driven reversal pollutes the audit trail. Fix the bug, then post the corrective entry through a legitimate use case.
§8. The partner-bank relationship surface — what we owe each partner
Each integration carries operational obligations that are not in the code:
| Obligation | Owner | Cadence | Status |
|---|---|---|---|
| Daily statement pull | Partner provides; we fetch | Daily, T+1 | PLANNED — no fetch job. |
| Reconciliation against partner statements | DemozPay | Daily | PARTIAL — primitive Live, cadence Planned. |
| Drift report to partner-bank ops | DemozPay → partner | When non-zero | PLANNED — no email/ticket workflow. |
| Dispute resolution (chargeback, customer claim) | Joint | As-incurred | PLANNED — no workflow. |
| Monthly volume + value report to partner | DemozPay → partner | Monthly | PLANNED |
| Sanctions screening on every transfer | DemozPay (pre-submit) | Per transfer | PLANNED — no screen. |
| AML transaction monitoring | DemozPay | Continuous | PLANNED |
| Suspicious activity reporting to NBE | DemozPay | As-flagged | PLANNED |
| Key rotation (HMAC signing keys) | DemozPay + partner | Quarterly | PLANNED — no rotation runbook. |
| Incident notification to partner | DemozPay → partner | Per incident | PLANNED |
These obligations are pre-conditions to opening a real bank rail. The reconciliation runbook covers drift detection; everything else is unwritten.
§9. Why this is a modular monolith and not a service mesh
ADR-001 + ADR-011 commit DemozPay to a modular monolith — one apps/api/ NestJS process composing many packages/<domain>/ bounded contexts — with three independently-deployed Go services (ledger, integration-gateway, notifications). The split is deliberate:
- The ledger is its own service because (a) its blast radius is the entire money-truth system and we want independent failure isolation, (b) its DB schema is locked-down append-only and should not share a Postgres with mutable application data, (c) its language (Go) gives us a deployable that's tiny, fast-booting, and easy to harden.
- The integration-gateway is its own service because partner-bank credentials must not be reachable from any non-money-moving code path. Compromising the API doesn't compromise the gateway.
- Notifications is its own service (or will be —
Stubtoday) because outbound SMS/email/push has different scaling and credential boundaries. - Everything else lives in the monolith because the operational cost of running N microservices is wholly unjustified by current traffic, team size, or domain count.
Carve-out trigger: a packages/<domain>/ becomes a service when (a) its event volume exceeds the monolith's outbox throughput, OR (b) its team becomes large enough to warrant independent deployment cadence, OR (c) it has different scaling/credential boundaries from the rest of the API. None of these are true today. Do not carve out for architectural fashion.
§10. What this document does NOT cover
- Specific bank-API request shapes — that's per-partner adapter source and the partner's docs.
- The reconciliation lifecycle in operational detail — see
RECONCILIATION_ARCHITECTURE.md. - The chart-of-accounts taxonomy — see ADR-012.
- The deployment + ops surface — see
PRODUCTION_READINESS.md.
§11. Single rule that survives every refactor
The bank statement is the truth of money. The ledger is the truth of obligations. When they disagree, the bank wins until we prove otherwise.
Anyone proposing a change that violates this rule is proposing custody. Don't.