Skip to main content

DemozPay — Real System State (Principal Architect Audit)

Snapshot: 2026-05-29 Author: Principal Architect / Acting CTO Replaces: the optimistic closure narrative in SYSTEM_GAP.md §0.

SYSTEM_GAP.md was a code-completion tracker. Closing those 12 gaps was necessary, not sufficient. This document is the honest picture of what runs, what only compiles, what fakes itself convincingly, and what cannot move real money.

Read this BEFORE telling anyone DemozPay is ready for a bank rail.

Status tags

Six tags, brutal:

  • LIVE — runtime-verified end-to-end against a real shape (real DB, real HTTP, real gRPC). Safe to depend on.
  • PARTIAL — one layer real, downstream layer missing. Will appear to work in tests; will fall over in production.
  • STUB — booting, looks alive on docker ps, does nothing useful.
  • PLANNED — proto/design/schema only.
  • BLOCKED — depends on something missing; cannot be Live until that lands.
  • DANGEROUS — implemented in a way that creates incident risk if shipped as-is. These are the things that wake up the on-call at 3 AM.

Headline picture

DemozPay is a competent skeleton with a working bank rail and three open doors.

  • The bank-orchestrator transition (S1–S4, GAP-01..12) is structurally complete. The ledger has pending/posted lifecycle, the gateway speaks Dashen's HMAC shape, the webhook + poller close the loop, reconciliation produces signed drift. This is real.
  • But authorization, RBAC, secrets, payroll, and four out of five frontends do not exist or are not enforced. The platform can move money correctly between mocked actors. It cannot yet verify that the actor invoking a disbursement is authorized to do so, nor that the money lands where the employer told us it should.
  • And five operational systems that fintech CTOs assume are present — rate limiting, intrusion alerting, secret rotation, replica routing, regulator reporting — are absent.

A 7-day soak with zero drift on the bank rail is achievable. Opening that rail to real users today is not.


§1. Domain-by-domain reality

EWA — packages/ewa/backend/ + apps/api/src/products/ewa/

CapabilityStatusNotes
Domain model + 9-state lifecycleLIVEewa-status.ts:32-61 covers PENDING → APPROVED → SUBMITTED_TO_BANK → ACCEPTED_BY_BANK → DISBURSED → REPAID terminal, with BANK_REJECTED + FAILED branches.
RequestEwaUseCaseLIVEpackages/ewa/backend/application/request-ewa.usecase.ts — eligibility + idempotency + outbox. Unit-tested.
DisburseEwaUseCaseLIVEPre-commit PENDING → gateway → ACCEPTED leaves PENDING; webhook/poller settles.
EligibilityPolicy (accrued earnings - prior advances - fee)PARTIALThe math is real (eligibility.ts). The AccruedEarningsPort adapter is a placeholder reading Employee.baseSalary and pro-rating elapsed days. Real eligibility needs payroll calendar + pending deductions, which do not exist.
RepaymentLIVE (Phase B1, 2026-05-29)RecordEwaRepaymentUseCase ships. Admin endpoint POST /api/ewa/requests/:id/record-repayment. Ledger DR PayrollClearing / CR ReceivableFromEmployee (P+F). Outbox event ewa.repaid.v1. 5 unit tests + idempotent. Payroll-event consumer path remains PLANNED until the payroll domain ships.
Cancel / Reject use casesPLANNEDDomain methods exist (approve, reject); no application-level use case routes a reject.
Fee calculationLIVEfeeFor() in eligibility.ts. Hardcoded via policy config (advance + fee bps).
Prisma repositoryLIVEapps/api/src/products/ewa/prisma-ewa.repository.ts — Prisma-backed findByProviderRef BYPASSes RLS by design.

Verdict: EWA can submit and settle a bank transfer. It cannot collect repayment. Calling EWA Live is misleading until the payroll-deduction loop closes.


Lending — packages/lending/backend/ + apps/api/src/products/lending/

CapabilityStatusNotes
Domain model + lifecycleLIVEloan-status.ts:11-32; CLOSED, REJECTED, BANK_REJECTED, DEFAULTED terminals.
QuoteLoanUseCaseLIVESchedule math + interest-rate policy.
RequestLoanUseCaseLIVEUnderwriting (income multiple) + idempotency + outbox. Unit-tested.
DisburseLoanUseCaseLIVEPre-commit PENDING, FI payable account, BANK_REJECTED reverses ledger.
RecordRepaymentUseCase (S3.3 / GAP-10)PARTIALThe triple-entry math (DR PayrollClearing / CR ReceivableFromBorrower / CR InterestRevenue) is real. Caller is an admin endpoint, NOT a payroll consumer. A human operator must POST per installment; no payroll-cycle integration exists.
RemitInstallmentToFiUseCase (B2 / GL-06)LIVE (ledger side, Phase B2, 2026-05-29)Admin endpoint POST /api/loans/:id/installments/:idx/remit-to-fi posts DR payable-to-fi-partner[fi_id] / CR payroll-clearing for the installment's principal. Outbox event loan.installment_remitted_to_fi.v1. Phantom-asset growth structurally closed on books. Bank-side outbound transfer is PLANNED — Phase B continuation requires schema change for employer payroll account modelling.
Installment auto-closeLIVELast installment → loan CLOSED + loan.closed.v1.
DEFAULTED handlerDANGEROUSThe enum value exists. No use case transitions a loan to DEFAULTED. No collections workflow. No write-off journal. Defaulted loans cannot be closed in the system today.
Interest accrualPARTIALInterest is split equally across installments at schedule time. No accrual-by-time. Acceptable for flat-rate; misleading if anyone calls it "amortized".
Forbearance / restructurePLANNEDNot modelled.

Verdict: Lending can disburse and accept repayment if a human triggers each installment. Production lending needs a payroll-cycle consumer (Planned), a defaults handler (Planned), and a collections workflow (Planned).


BNPL

CapabilityStatus
packages/bnpl/PLANNED — directory does not exist.
Legacy Prisma BNPLPurchase / BNPLPartner / Merchant modelsDANGEROUSDecimal(15,2) money fields violate ADR-005; no tenantId; no RLS. If anyone writes to these tables, they bypass every architectural invariant.
Merchant settlement flowPLANNED
Repayment flowPLANNED

Verdict: Do not promise BNPL externally until the legacy models are deleted (not just deprecated) and a real packages/bnpl/ lands. The legacy models are an attractive nuisance: they will read as "BNPL exists" to any new engineer.


Payroll

CapabilityStatus
packages/payroll/PLANNED — directory does not exist.
Legacy Payroll / PayrollEntry Prisma modelsDANGEROUSDecimal(15,2), no tenant_id, no RLS. Same attractive-nuisance shape as BNPL.
Payroll-run enginePLANNED
Deduction calculator (drives EWA + lending repayment)PLANNED
Pay-period / pay-cycle calendarPLANNED

Verdict: Payroll is the trust anchor of this entire platform's value proposition. Without payroll:

  • EWA eligibility is fiction (we don't know what the employee earned).
  • Lending repayment cannot happen automatically.
  • BNPL repayment cannot happen automatically.

Saying "Payroll Q3" sounds reasonable. In product reality, payroll absence blocks 80% of the platform's value. This is the single largest gap on the page.


Savings / Equb

CapabilityStatus
packages/savings/PLANNED — does not exist.
packages/equb/PLANNED — does not exist.
Legacy Equb / EqubPayout / SavingGoal Prisma modelsDANGEROUSDecimal(15,2), no tenant_id, no RLS.

Verdict: Same as BNPL. Roadmap items dressed as Live legacy models.


Integration Gateway — services/integration-gateway/

CapabilityStatusNotes
gRPC server boots + serves 4 RPCsLIVEcmd/integration-gateway/main.go.
InitiateDisbursementLIVEIdempotent via (tenant_id, idempotency_key). State machine INITIATED → SUBMITTED → ACCEPTED → SETTLED.
GetDisbursementStatusLIVEPolls partner adapter.
LookupAccountLIVE (existence-check, Phase C, 2026-05-29) / PARTIAL (name-match PLANNED)Real RPC handler routes via AccountLookup interface to per-partner adapters. Dashen + mock both implement. Bank-sandbox backs E2E. EWA + Lending disburse use cases fail-closed on lookup-fail before any ledger / outbox / partner side effect. Audit row + 409 error with typed reason. 7/7 E2E pass. Name-match deferred — adapter returns resolved_holder_name but use case does not compare it against an expected name (no expected-name field in DTO/aggregate). See PHASE_C_LOOKUP_ACCOUNT.md.
GetAdapterStatusSTUBAlways returns HEALTHY (lookup_and_health.go:48). No real adapter health tracking.
Webhook handlerLIVEwebhook/handler.go — HMAC-verified, max 1 MiB body.
Dashen adapterLIVEReal HTTP+HMAC against bank-sandbox. Production-shape.
Mock adapterLIVEAlways ACCEPTED — for tests.
Other partner adapters (CBE, Awash, Telebirr, M-Birr)PLANNED
State machineLIVEDB function disbursement_transition_status.
Outbound retriesPARTIALSettlement poller retries on 30s tick; no exponential backoff; no jittered scheduling; no dead-letter.
Circuit breaker per partnerPLANNED
Rate limiting per partnerPLANNED
mTLS to partnerPLANNED — only HMAC today.
BYPASSRLS role for cross-tenant pollerPLANNEDwebhook/handler.go:62 carries the TODO.

Verdict: Real partner-shape implementation against a real test bank. Missing account-verification (LookupAccount) is the single most operationally dangerous gap. A misrouted disbursement that the bank accepts is unrecoverable without partner cooperation.


Ledger — services/ledger/

CapabilityStatusNotes
Schema + invariants (balanced commit, append-only, partial-unique on reverses, RLS, FORCE RLS)LIVEFour migrations apply clean.
PostTransaction w/ idempotencyLIVE
Reverse w/ double-reversal lockoutLIVE
GetBalance, GetEntries, ReconcileAccountLIVE
ConfirmSettlement + MarkSettlementFailed (GAP-01)LIVEIdempotent; FailedPrecondition on illegal transitions.
ReconcileWithBank (GAP-11b / S4.3)LIVESigned drift, account-type sign rule, currency sanity.
Tenant isolation via SET LOCAL app.tenant_idLIVELOCAL auto-resets at COMMIT/ROLLBACK (connection-leak safe).
Prometheus metrics on RPCsLIVE (Phase D, 2026-05-29)gRPC interceptor instruments every RPC with demozpay_ledger_rpc_requests_total{rpc,outcome} + demozpay_ledger_rpc_latency_seconds{rpc} histogram. Plus per-ledger gauges (entries posted, transaction-status transitions, reconcile drift). :50054/metrics. Cardinality-disciplined.
Snapshot/projection tablePLANNED — not yet needed at current scale.
Multi-currency settlementLIVE at schema levelcurrency field on entry; only ETB tested.

Verdict: The ledger is the most production-ready component. Add Prometheus metrics before opening any rail.


Settlement / Bank-orchestration glue — apps/api/src/money/integration/

CapabilityStatus
Webhook controller + HMAC verifierLIVE
Settlement pollerLIVE (S3.1)
Cross-domain applier (BankSettlementApplier)LIVE
findByProviderRef cross-tenant lookupLIVE — uses BYPASSRLS pattern.
Stale-pending alert (>24h)PLANNED — counter exists, threshold + escalation not wired.
Webhook DLQPLANNED — failed webhook is logged then 500; no replay buffer.

Verdict: The happy path closes. The unhappy paths (24h staleness, repeated webhook signature failures, malformed bodies) log but do not page anyone.


Reconciliation — services/integration-gateway/internal/reconciliation/ + ledger ReconcileWithBank

CapabilityStatus
Dashen CSV ingesterLIVE
Matcher (amount + reference + value-date ± 24h)LIVE
Runner pipeline orchestratorLIVE
Store w/ (tenant_id, partner, partner_reference) idempotencyLIVE
Ledger-side ReconcileWithBank RPCLIVE
Daily-cadence cron / scheduled-job harnessPLANNED — there is no scheduled invoker. The runner can be called by a human or a test script; no production cron entry.
Drift dashboardPLANNED — no Grafana board file in infra/.
Flagged-line operator workflowPLANNED — no admin UI; no API endpoint to query flagged lines.
Statement-pull automationPLANNEDbank_statement_line ingests files; no code fetches files from a partner SFTP/API.

Verdict: The reconciliation primitive is real. The reconciliation system — daily cadence, operator workflow, dashboard, paging — is not. Drift-clean cannot be claimed until daily cadence runs unattended.


Authentication — apps/api/src/identity/auth/ (better-auth)

CapabilityStatus
Email + password sign-up + sign-inLIVE
Email verificationPARTIAL — gated on NODE_ENV=production; dev skips.
Organization plugin → activeOrganizationIdbusinessIdLIVE
Phone OTP plugin (phoneNumber)PARTIAL — wired; SMS sender is LoggingSmsSender (logs to stdout). No real SMS gateway in production code path.
2FA / TOTP / WebAuthnPLANNEDtwoFactor table exists; plugin not wired.
Session storageLIVE — Prisma Session table.

Verdict: Email-only auth in production today. Bank-grade access requires MFA. Until phone OTP has a real SMS provider AND TOTP/WebAuthn is wired, no operator should hold admin credentials in production.


Authorization (AuthZ + RBAC) — apps/api/src/identity/auth/auth.guard.ts

CapabilityStatus
Global AuthGuard enforces req.user.id existsLIVE
@Public() opt-outLIVE
RBAC (role → permission → endpoint)LIVE (Phase A2, 2026-05-29)
Per-entity ownership checks (employee A disburses employee A's EWA, not employee B's)PARTIAL (Phase A1+A2)
Business + Employee CRUD endpointsLIVE (Phase A2, 2026-05-29)

Verdict (updated 2026-05-29 post-Phase A): the authorization model is now LIVE for the 4 controllers in scope (Business, Employee, EWA, Lending). Tenant isolation at the DB layer (RLS) covers financial tables; application-layer RBAC + OrgRoleGuard + service-layer tenant-scoped queries cover identity-tier (Business, Employee). EWA/Lending self-service for borrowers (vs. admin-acting-for-employee) requires the Employee.userId schema link — Phase B item.


Observability — apps/api/src/_infra/observability/

CapabilityStatus
Prometheus metrics on APILIVE
Prometheus metrics on Go servicesLIVE (Phase C + D) — gateway: lookup_* metrics from Phase C at :50053/metrics. Ledger (Phase D): every RPC instrumented + custom ledger gauges at :50054/metrics.
OpenTelemetry tracing wiringLIVE — but no-op unless OTEL_EXPORTER_OTLP_ENDPOINT is set. No collector deployed.
Structured JSON loggingLIVE (Pino on API, slog on Go services)
PII redaction at loggerLIVE (Phase A3, 2026-05-29)
Request-ID propagationPARTIAL — only via OTel trace_id, which is no-op without a collector.
Prometheus alert rulesPLANNED — no .yml rules anywhere in infra/.
Dashboards (Grafana)PLANNED — none in repo.
SLOs / error budgetsPLANNED — no SLO file.

Verdict: Metrics surface is healthy. Nothing alerts on them. Nobody is watching.


Outbox + Events — apps/api/src/_infra/outbox/

CapabilityStatus
Outbox publisher with FOR UPDATE SKIP LOCKEDLIVE
Idempotent Kafka producerLIVE
BYPASSRLS role gate via OUTBOX_DATABASE_URLLIVE — but silently drains zero rows if env var not set in production.
Topic-per-bounded-contextLIVE
Event catalog docPLANNEDdocs/architecture/event-catalog.md referenced in ADR-011 follow-ups; does not exist.
ConsumersPLANNEDservices/notifications is /health only. No reconciliation consumer. No BI consumer. Events are produced into a void.

Verdict: The outbox correctly persists + drains. Nothing consumes the events, so every domain emit is a tree falling in an empty forest. Notifications, reconciliation triggers, BI pipelines, audit-mirror — none exist.


Idempotency — apps/api/src/_infra/shared-infra/prisma-idempotency.store.ts

CapabilityStatus
Two-phase claim (CLAIM → RECORD)LIVE
Composite PK (tenantId, scope, key)LIVE
Fingerprint-mismatch raises IdempotencyViolationErrorLIVE
In-flight conflict → 409LIVE
Replay window TTLDANGEROUS

Verdict: Add a daily TTL cleanup job before go-live.


Audit — apps/api/src/_infra/shared-infra/prisma-audit.emitter.ts

CapabilityStatus
(action, entityType, entityId, before, after) recorded in same tx as state changeLIVE (ADR-008 enforced)
actorId free-form stringLIVE — but trusts the x-actor-id header (see AuthZ findings above). Audit log is poisonable.
Immutable / append-onlyLIVE — table has no UPDATE/DELETE triggers, but no application code mutates either.
Cross-tenant viewer for supportPLANNED

Verdict: Audit log is real and poisonable. Fix x-actor-id validation before opening any rail.


Tenant isolation — apps/api/src/identity/tenant/tenant-context.middleware.ts + ADR-013

CapabilityStatus
RLS on 5 financial tables (ewa_request, loan, outbox_event, idempotency_record, audit_entry)LIVE
FORCE ROW LEVEL SECURITYLIVE
Verify-guard DO block at migration timeLIVE
loan_repayment table RLSLIVE (S3 migration)
bank_statement_line RLSLIVE (S4 migration)
disbursement + bank_event RLS (gateway DB)LIVE
RLS on identity tables (Session, Account, Verification, User, Organization)N/A by design — cross-tenant identity.
RLS on Business + EmployeeN/A
Tenant context applied to Business + Employee controllersDANGEROUS — not wired. See AuthZ.

Verdict: Tenant isolation on financial rows is real and load-bearing. Tenant isolation on identity + workforce rows is not enforced and cannot be enforced by RLS alone because Business has no tenantId. Application-layer scoping is the only mechanism — and it isn't wired.


Frontends — apps/{admin,employer,employee,fi,merchant,docs}-web/

AppPagesData sourceAuthVerdict
admin-web9 routesMOCK (10 mockData refs)None wiredSTUB
employer-web8 routesMOCK (43 mockData refs)None wiredSTUB
employee-web9 routesMOCK (5 refs)localStorage only — no real backend sessionDANGEROUS as a demo
fi-web8 routesAPI (no mocks detected)Not validated end-to-endPARTIAL
merchant-web8 routesAPI (no mocks detected)Not validated end-to-endPARTIAL
docs-web1 pageStaticn/aSTUB

Verdict: Four of six web apps cannot show real data to real users. The two that try (fi-web, merchant-web) have not been verified against the live API. Frontend integration is a 3-month effort across the suite.


Infrastructure / Ops

CapabilityStatus
Docker-compose dev harnessLIVE
Docker-compose test harness (docker-compose.test.yml)LIVE — used by S2/S3/S4 verify scripts.
Kubernetes manifestsPLANNED — none.
Terraform / IaCPLANNED — none.
Helm chartsPLANNED — none.
Secrets management (Vault / AWS SM / SOPS / sealed-secrets)PLANNED — all secrets are env vars.
CI: lint + test + build + e2e via NxLIVE
CI: prisma-validate / migration-safety checkPLANNED
CI: SAST / secret scanningPLANNED
Blue-green / canary deploy strategyPLANNED
Database backup + PITRPLANNED
Database replica routingPLANNED
pgbouncer / connection poolerPLANNED — direct pgxpool to a single primary.

Verdict: Local dev is solid. There is no production deployment story. Going from docker-compose.yml to "regulator-acceptable production" is a 6-week dedicated workstream.


Compliance + Regulatory

CapabilityStatus
ADR-014 ("DemozPay is orchestrator, not custodian")PLANNED — recommended in SYSTEM_GAP_ACTION_PLAN.md but not written.
NBE (National Bank of Ethiopia) engagementPLANNED
AML/CFT programPLANNED
Sanctions screening (OFAC + Ethiopian sanctions list)PLANNED
KYC + identity verificationPLANNEDpackages/kyc/ does not exist.
Transaction monitoring (suspicious-pattern detection)PLANNED
Regulatory reporting (monthly NBE returns, SAR/STR filings)PLANNED
Data residency (Ethiopia-domiciled storage for PII)PLANNED
Right-to-be-forgotten workflowPLANNED
ISO 27001 / SOC 2 / PCIPLANNED (per SECURITY_CONTROLS.md header table)

Verdict: Outside the engineering remit, but every one of these items will be asked for by the first regulator conversation or the first partner-bank due-diligence. A real bank rail to real users without at least KYC + AML + sanctions is a regulator-incident waiting to be triggered.


§2. The five things that will hurt first in production

Ranked by likelihood × blast radius, not by code size.

Items 1 + 2 below were CLOSED by Phase A on 2026-05-29. They remain in the list (with strikethrough-equivalent annotation) so future-you reading the doc cold remembers what the urgent vectors WERE — and verifies they didn't regress.

  1. Authorization bypass via x-actor-id. CLOSED (Phase A1). TenantContextMiddleware rejects the header with 400; actorId sourced from req.user.id via AsyncLocalStorage. Verified by apps/api/src/identity/tenant/tenant-context.middleware.spec.ts and full S3/S4 regression.

  2. Cross-tenant business/employee enumeration. CLOSED (Phase A2). Both controllers under TenantContextMiddleware; OrgRoleGuard rejects cross-tenant; service queries are tenant-scoped; body businessId mismatch → 403.

  3. EWA cannot be repaid. CLOSED (Phase B1). RecordEwaRepaymentUseCase ships with admin endpoint. Ledger journal correct; outbox event emitted.

  4. LookupAccount is a stub. PARTIAL-CLOSED (Phase C, 2026-05-29). Existence-check ships LIVE: misrouted disbursement to a NON-EXISTENT account is now structurally impossible — the use case rejects at 409 before any money instruction. Name-match (different existing account) remains the carry-over Phase C continuation item. The blast radius is reduced from "unrecoverable" to "operator-actionable via bank-statement reconciliation".

  5. The legacy Wallet* + BNPL* + Payroll* + Equb* Prisma models still exist. They are @deprecated and ESLint-blocked from imports, but the tables are still in the schema. They invite new engineers to write code that violates ADR-005/006/013. Delete the tables in a forward migration; don't leave the temptation.


§3. What is over-engineered

Be honest about this too.

  • packages/contracts/openapi/ has a README listing future files. No source-of-truth OpenAPI spec exists for the REST endpoints we do ship. Either ship one or delete the placeholder.
  • OTel SDK wired at boot but no collector deployed. Pays cost (boot time, dependency tree) for zero current benefit. Either deploy a collector (Jaeger / Tempo / Grafana Cloud) or comment out until needed.
  • Five separate web apps before the API is integrated with one. Premature breadth. Pick one (employer-web is the highest-value: it's the customer who pays) and integrate it end-to-end before moving to the next.

§4. What is under-engineered

  • No payroll domain. Re-emphasising: payroll is the value proposition. Build it next.
  • No KYC domain. Cannot ship in Ethiopia without identity-verification + sanctions-screening as Live, not Planned.
  • No collections domain. loan.markDefaulted() exists in the enum and has no caller. The first defaulting loan will reveal a process gap, not a code gap.
  • No admin tooling. Operator playbooks reference DB queries copy-pasted into psql. Every fintech needs an internal admin UI: replay a webhook, force-resync a disbursement, view a ledger account, replay an outbox event. None exist.

§5. The three most uncomfortable conversations

These need to happen before the next sprint:

  1. With Product: "We cannot promise BNPL, Savings, or Equb to partners in the next 90 days. They have no code." Adjust the roadmap, kill the bullet points, refund the customer expectations.
  2. With Security/Compliance: "We have one Live security control on the SECURITY_CONTROLS.md table — tenant isolation. The rest are Planned." Decide what is mandatory before pilot, not after.
  3. With the Board / Investors: "The 12 GAPs we marked closed represent code completeness, not platform readiness. We are 60–90 days from a real pilot, not 7 days." Re-set expectations now; the alternative is missing the date and explaining it later.

§6. What this document does NOT claim

  • It does not claim the work to date was wasted. The S1–S4 work is necessary infrastructure and was done correctly.
  • It does not propose a re-architecture. The boring-fintech-orchestrator model is right. The execution is incomplete.
  • It does not list every TODO. It lists what would surprise a fintech CTO on day-one of taking the platform live.

§7. Cross-references