DemozPay — System Gap Action Plan
Companion to:
SYSTEM_GAP.mdSnapshot: 2026-05-28 Purpose: sequenced, sprint-level plan to close every gap inSYSTEM_GAP.mdand transition the platform from wallet-bookkeeping to bank-orchestrator reality. No commits while standing instruction is in force. Code changes sit in the working tree.
How this document relates to SYSTEM_GAP.md
SYSTEM_GAP.mdis the inventory — every gap, every checkbox, every architecturally-invalid line.SYSTEM_GAP_ACTION_PLAN.md(this file) is the execution plan — sprint windows, who does what, what "done" looks like, what blocks what.
When a gap is closed, you tick it in both files: the entry in SYSTEM_GAP.md AND the corresponding row here.
Executive summary
The transition is a 4-sprint program (~8 calendar weeks at 1 senior backend engineer; ~4 weeks at 3 engineers in parallel).
| Sprint | Calendar | Theme | Outcome |
|---|---|---|---|
| S1 | weeks 1–2 | Ledger + ports + domain — the foundation | Ledger supports PENDING/POSTED/FAILED. Use cases pre-commit. Domain knows bank states. Nothing actually moves money yet — gateway is still a stub. |
| S2 | weeks 3–4 | Integration-gateway real Go service + Dashen adapter (longest single job) | First real bank API call lands. End-to-end EWA disbursement to a Dashen test account works in staging. |
| S3 | weeks 5–6 | Settlement confirmation loop + webhook receiver + repayment | The settlement-state machine closes. Lending has repayment. We can claim "money moves end-to-end." |
| S4 | weeks 7–8 | Reconciliation + statement ingestion + legacy cleanup + first real soak | Ledger drift is measurable against the bank. Legacy wallet models out of the way. Production-readiness gate clears. |
Gate to go-live: end of S4. No real bank rail can be opened to real users before all of S1–S3 has closed AND S4's drift detection runs clean for at least 7 consecutive days.
The single biggest risk to this plan is integration-gateway / Dashen adapter (S2). It is the largest piece of work and the one item where external dependencies (Dashen API spec, sandbox access, signing keys) could delay everything. Mitigate by starting Dashen sandbox onboarding in parallel with S1 (week 1 day 1).
Sprint 1 — Ledger foundation + domain shift
Weeks 1–2. Single engineer can carry this. Cannot be parallelised meaningfully — items depend on each other.
Goal
After S1, the codebase models the bank-orchestrator reality at the type level. Use cases pre-commit ledger entries as PENDING, call the gateway, handle accepted/rejected outcomes, and emit lifecycle-accurate events. The gateway is still a stub, so nothing actually moves money — but every layer above it is correct.
Tasks in execution order
| # | Gap | Task | Days | Acceptance criteria |
|---|---|---|---|---|
| S1.1 | GAP-01a | Write migration 0003_pending_posted.up.sql. Adds status column, transition function, replaces append-only trigger with column-immutability. | 0.5 | Migration applies clean against hermetic PG. Verify guard catches illegal transitions (test: try POSTED → PENDING, must raise). |
| S1.2 | GAP-01e | Update packages/contracts/grpc/ledger.proto. Add LedgerTransactionStatus enum, add ConfirmSettlement + MarkSettlementFailed RPCs, add status field on PostTransactionRequest. | 0.5 | buf lint passes. buf generate produces stubs (run on Go-equipped host). |
| S1.3 | GAP-01b | services/ledger/internal/server/post_transaction.go accepts status parameter (default POSTED for back-compat, PENDING allowed). | 0.5 | Integration test: post with PENDING → row visible, status='PENDING', balance trigger still fires at COMMIT. |
| S1.4 | GAP-01c | New services/ledger/internal/server/confirm_settlement.go — RPC ConfirmSettlement(tenant_id, tx_id, partner_reference). Calls the SQL transition function PENDING → POSTED. Idempotent (calling on an already-POSTED tx returns success, calling on REVERSED/FAILED returns FailedPrecondition). | 0.5 | Integration tests: confirm twice → both succeed (idempotent). Confirm a FAILED tx → FailedPrecondition. |
| S1.5 | GAP-01d | New services/ledger/internal/server/mark_failed.go — RPC MarkSettlementFailed(tenant_id, tx_id, reason). Calls SQL transition PENDING → FAILED. Idempotent in the same shape. | 0.5 | Integration tests: same idempotency contract. |
| S1.6 | GAP-02 | Extend DisbursementPort (both EWA + lending versions) with getStatus(reference) method and richer DisbursementResult (bankStatus, acceptedAt, failure fields). | 0.5 | EWA + lending unit tests still pass with updated in-memory adapters. Type-check passes. |
| S1.7 | GAP-03a / GAP-03b | Add bank-state enum values to EwaStatus and LoanStatus. Update transition tables. | 1 | Unit tests cover every new transition path. |
| S1.8 | GAP-04a / GAP-04b | Rewrite apps/api/src/products/ewa/ledger-accounts.adapter.ts and apps/api/src/products/lending/ledger-accounts.adapter.ts to use bank-orchestrator account taxonomy. Drop cashAccountId. Add payableToBusinessBankAccountId, payableToFiPartnerAccountIdByFi, payrollClearingAccountId. | 1 | Type-check passes against new use-case shape. Unit tests pass. |
| S1.9 | GAP-05 | Rewrite DisburseEwaUseCase and DisburseLoanUseCase: pre-commit ledger as PENDING → call gateway → on ACCEPTED leave PENDING + emit *_accepted.v1 + persist SUBMITTED_TO_BANK → on REJECTED MarkSettlementFailed + Reverse + persist BANK_REJECTED + emit *_failed.v1. Settlement confirmation (PENDING→POSTED) is NOT in the use case — it's the poller (S3). | 1.5 | Unit tests cover: happy ACCEPTED path, REJECTED-at-submit path, gateway-throws path. In-memory adapters simulate each. |
| S1.10 | GAP-06a / GAP-06b | Rename outbox events in packages/ewa/backend/domain/events.ts + packages/lending/backend/domain/events.ts. | 0.5 | Old event names removed entirely. Tests assert the new names. |
S1 total: ~7 engineer-days. Closes 9 of 12 gap headlines structurally (but downstream verification waits for S2).
S1 acceptance gate
- All S1 unit tests pass (
nx test ewa,nx test lending,services/ledger/test/verify.sh). -
nx build apiclean.nx lint apiclean. - Migration
0003_pending_postedapplies + verify guard runtime-proven against hermetic PG. - At least one integration test on the Go ledger that exercises: post PENDING → confirm → POSTED visible; post PENDING → mark_failed → FAILED visible; balance trigger still rejects unbalanced PENDING at INSERT.
- Honest negative test: a synchronous gateway throw mid-use-case → ledger still has PENDING + EWA marked SUBMITTED_TO_BANK (NOT DISBURSED). Reconciliation will catch it later. Documented as the expected hole until S3.
Sprint 2 — Real integration-gateway + Dashen
Weeks 3–4. The single longest engineering job in the program. 1 Go engineer + parallel work on Dashen API onboarding (started week 1).
Goal
After S2, services/integration-gateway/ is a real Go gRPC service with its own Postgres database, a state machine, and at least one production-shape partner adapter (Dashen). EWA + lending disbursement reaches Dashen's sandbox end-to-end.
Parallel pre-work (start week 1)
This is the longest external blocker. Begin immediately, in parallel with S1.
| Task | Owner | Notes |
|---|---|---|
| Onboard with Dashen Bank API team | BizDev + Engineering lead | Sandbox credentials, signing key provisioning, webhook URL whitelisting |
| Obtain Dashen API spec document | Engineering lead | Need: auth model, signing algorithm, request shapes for transfer-initiation + status-query, webhook payload shape + signature verification |
| Set up Dashen sandbox account + test bank-to-bank transfer (manually, via Dashen's UI) | Operations | Confirms credentials + answers "is the sandbox real money or test money?" |
If Dashen blocks for more than 7 days, escalate. The whole sprint depends on it.
Tasks in execution order
| # | Gap | Task | Days | Acceptance criteria |
|---|---|---|---|---|
| S2.1 | GAP-07a | Scaffold services/integration-gateway/internal/{config,pg,server,store,statemachine,adapters,webhook}. Replace cmd/integration-gateway/main.go with real gRPC server + HTTP listener for webhooks. | 1 | Service boots; /health works; gRPC server listening; grpcurl shows registered methods. |
| S2.2 | GAP-07b | Migration services/integration-gateway/migrations/0001_init.up.sql — disbursement + bank_event tables per SYSTEM_GAP.md §3.7. | 0.5 | Migration applies; RLS on both tables (gateway is multi-tenant too); the same tenant_isolation pattern as the API/ledger. |
| S2.3 | GAP-07c | Implement PartnerAdapter interface and adapter registry. Move proto-defined types Rail, Account, DisbursementStatus into the Go server. | 1 | Unit test: registry resolves by partner string; unknown partner → InvalidArgument. |
| S2.4 | GAP-07d | Implement the gRPC handlers InitiateDisbursement, LookupAccount, GetDisbursementStatus, GetAdapterStatus. Each routes to the right adapter and persists disbursement rows + bank_event records. | 1.5 | Integration test against a fake adapter: idempotency works (same idempotency_key → cached row); state transitions are forward-only; concurrent InitiateDisbursement for same key serializes safely. |
| S2.5 | GAP-07e | Implement state machine: INITIATED → SUBMITTED → ACCEPTED → SETTLED plus failure branches. Updates are append-only in bank_event; disbursement.status is the materialised state. | 1 | Unit tests for every legal + illegal transition. |
| S2.6 | GAP-07f | Implement Dashen adapter in internal/adapters/dashen/. HTTP client (Go std net/http) with mTLS, request signing per Dashen's spec, response normalisation. Loads credentials from env. | 2–3 | Adapter unit tests against canned Dashen response fixtures. Integration test against Dashen sandbox: real transfer initiated, partner_reference returned, status query returns the expected lifecycle. Cannot proceed without Dashen sandbox. |
| S2.7 | GAP-07g | Wire the Dashen adapter into the partner registry. Add INTEGRATION_GATEWAY_DASHEN_BASE_URL, INTEGRATION_GATEWAY_DASHEN_API_KEY, INTEGRATION_GATEWAY_DASHEN_SIGNING_KEY to env schema. | 0.5 | Boot of gateway in dev with Dashen creds succeeds; without them, Dashen adapter is unregistered (other partners still available). |
| S2.8 | — | End-to-end smoke: API → use case → gateway → Dashen sandbox → ACCEPTED received. Ledger has PENDING entry. EWA row is SUBMITTED_TO_BANK. | 1 | Manual run + recorded bank_event rows. |
S2 total: ~8–9 engineer-days. Critical path. Slippage here delays everything.
S2 acceptance gate
-
services/integration-gatewayboots as a real gRPC service. - Migration applied cleanly. RLS on
disbursement+bank_event. - Dashen sandbox transfer initiated end-to-end,
bank_eventrow visible. -
nx run integration-gateway:test(if a hermetic test harness mirroring the ledger's is built — recommended). - Ledger has a PENDING entry for the test transfer. Nothing has been settled yet (that's S3).
Sprint 3 — Settlement loop + repayment + webhooks
Weeks 5–6. Parallelisable: GAP-08 + GAP-09 + GAP-10 can run with 3 different engineers.
Goal
The settlement-state machine closes. EWA + lending know about SETTLED. Lending knows how to repay. The webhook path is hardened.
Tasks (parallelisable)
| # | Gap | Task | Days | Acceptance criteria |
|---|---|---|---|---|
| S3.1 | GAP-08 | New NestJS service apps/api/src/money/integration/settlement-poller.service.ts. Polls every 30s. Queries ewa_request + loan in SUBMITTED_TO_BANK / ACCEPTED_BY_BANK. For each, calls getStatus via gateway. On COMPLETED → ledger.ConfirmSettlement + EWA DISBURSED + emit *_settled.v1. On FAILED → ledger.MarkSettlementFailed + ledger.Reverse + EWA BANK_REJECTED + emit *_failed.v1. Uses BYPASSRLS role (provisioned via infra/sql/01_create_settlement_poller_role.sql — new). | 2 | Integration test: seed a PENDING row, mock gateway returns COMPLETED → poller flips to DISBURSED + ledger POSTED. Same for FAILED. Stale-pending (>24h) escalates an alert metric. |
| S3.2 | GAP-09 | New NestJS controller apps/api/src/money/integration/bank-webhook.controller.ts. Path POST /api/integration/bank-callback/:partner. HMAC signature verification per adapter. @Public() (webhooks don't carry user auth). Normalised payload → same ConfirmSettlement / MarkSettlementFailed flow as poller. Stores raw payload in bank_event. Idempotent: re-receiving the same partner_reference returns 200 without re-applying. | 1.5 | Integration test: replay a webhook → only one ledger transition observed. Bad signature → 401. Unknown partner → 404. |
| S3.3 | GAP-10 | New use case packages/lending/backend/application/record-repayment.usecase.ts. Takes (loanId, installmentIndex, amountFromPayrollClearing). Ledger entries: DR payrollClearing / CR loanReceivable (principal) + CR interestRevenue (interest portion). Marks installment PAID. If last → loan CLOSED. Emits loan.installment_repaid.v1. | 1.5 | Unit tests against in-memory ports. The use case is consumed by a payroll-deduction event consumer (which doesn't exist yet — for now the consumer is a script/admin endpoint, called manually). |
| S3.4 | — | Wire a NestJS endpoint POST /api/lending/loans/:id/installments/:idx/record-repayment that calls the use case. Auth-gated (AuthGuard). Temporary admin trigger until payroll domain ships. | 0.5 | E2E test: disburse a fake loan → trigger 3 repayments via the endpoint → ledger zeros the receivable → loan CLOSED. |
S3 total: ~5.5 engineer-days, parallelisable to ~2 elapsed days with 3 engineers.
S3 acceptance gate
- Settlement poller observed (in staging or hermetic harness) flipping a PENDING row to POSTED.
- Webhook controller observed accepting + rejecting based on signature.
- Loan repayment integration test passes against real Go ledger.
- First true end-to-end: trigger EWA in staging → Dashen sandbox transfer → wait for settlement → API state goes
SUBMITTED_TO_BANK → ACCEPTED_BY_BANK → DISBURSED. Ledger POSTED.
Sprint 4 — Reconciliation + legacy cleanup + soak
Weeks 7–8. Parallelisable: GAP-11 + GAP-12 can run alongside soak testing.
Goal
The bank-vs-ledger reconciliation primitive lands. Legacy Wallet* models are out of the way. The platform soaks for at least 7 consecutive days with zero drift before being declared go-live ready.
Tasks
| # | Gap | Task | Days | Acceptance criteria |
|---|---|---|---|---|
| S4.1 | GAP-11a | Add services/integration-gateway/internal/reconciliation/{statement-ingester,matcher,drift-reporter}.go. Define bank_statement_line table. Implement Dashen-format CSV parser as the first ingester. | 2 | Test: ingest a synthetic Dashen statement file → each line creates a bank_statement_line row. |
| S4.2 | GAP-11a | Matcher: for each unmatched bank_statement_line, find matching disbursement by (amount, partner_reference, value_date ± 1d). Mark RECONCILED. Unmatched → flagged. | 1.5 | Integration test: 100 disbursements + statement → all matched. Inject one fake line → flagged. |
| S4.3 | GAP-11b | Ledger RPC ReconcileWithBank(account_id, period, statement_total). Returns (ledger_total, statement_total, drift). Logs drift as error if non-zero. | 1 | Unit test against fake data: drift exactly 0 by construction. Inject one tampered row → drift detected. |
| S4.4 | GAP-12a | apps/api/prisma/schema.prisma: add /// @deprecated triple-slash comments to Wallet, WalletTransaction, WithdrawalRequest, LedgerAccount.balance. | 0.5 | Prisma generate clean. Schema diff shows comments only. |
| S4.5 | GAP-12b | Custom ESLint rule tooling/eslint/no-deprecated-prisma-models.ts. Blocks imports of Wallet, WalletTransaction, WithdrawalRequest from @prisma/client outside apps/api/prisma/. | 1 | Lint test: importing Wallet from anywhere in packages/ fails. |
| S4.6 | — | 7-day soak in staging. Daily reconciliation report. Drift must be 0 every day. Any non-zero drift halts go-live until root-caused and fixed. | (calendar days; ~1 engineer-day per daily review) | 7 consecutive days drift=0, observed in dashboard + log. |
S4 total: ~6 engineer-days code + 7 calendar days soak.
S4 acceptance gate (== Go-live gate)
- All gaps GAP-01..GAP-12 ticked
[x]inSYSTEM_GAP.md. - 7-day soak with zero drift, dashboarded.
- Production runbooks exist for: webhook failure, gateway-down, drift-detected, bank-statement-parse-failed.
- At least one full incident drill executed: simulated bank rejection → observed reversal, observed alert.
- Sign-off from: engineering lead, security lead, finance/ops lead.
Until ALL of these are green, no real bank rail opens to real users.
Dependency map
GAP-01 (ledger pending/posted)
│ blocks everything below
├─→ GAP-02 (port shape) ──┐
│ ├─→ GAP-03 (domain status) ─→ GAP-05 (use cases) ─→ GAP-06 (events)
├─→ GAP-04 (account taxonomy) ┘ │
│ │
└─→ GAP-07 (integration-gateway) ──┬─→ GAP-08 (poller) ────────────────────────┐ │
├─→ GAP-09 (webhooks) ────────────────────┐│ │
└─→ GAP-11 (reconciliation) ───────────┐ ││ │
│ ││ │
GAP-10 (lending repayment) — independent of bank flow once GAP-01..05 land ──┐│ ││ │
▼▼ ▼▼ ▼
[GO-LIVE GATE]
GAP-12 (legacy deprecation) — runs whenever, blocks nothing
The critical path is GAP-01 → GAP-07 → soak. Everything else parallelises around it.
Parallel tracks (for a 3-engineer team)
| Engineer | Sprint 1 | Sprint 2 | Sprint 3 | Sprint 4 |
|---|---|---|---|---|
| Eng-A (TS/NestJS) | GAP-01 (proto + ledger client side) → GAP-02 → GAP-03 → GAP-05 → GAP-06 | (Free for review / catch-up) | GAP-08 (poller) | GAP-12 (legacy deprecation) + soak monitoring |
| Eng-B (Go) | GAP-01 (Go server side: status column, ConfirmSettlement, MarkSettlementFailed) → GAP-04 review | GAP-07 (gateway + Dashen) ← critical path | GAP-09 (webhook receiver, in TS — pair with Eng-A) | GAP-11 (reconciliation, in Go) |
| Eng-C (TS/NestJS) | GAP-04 (account taxonomy rewrite) + S1 testing infra | Tests for GAP-07 in integration with API | GAP-10 (lending repayment) | Soak monitoring + runbooks |
Single-engineer fallback: sequential execution as listed in S1→S4. ~22 working days.
Per-gap acceptance criteria summary (quick lookup)
| Gap | "Done" means |
|---|---|
| GAP-01 | ledger_transaction.status column exists. Migration verify guard catches illegal transitions. PostTransaction(PENDING) returns; ConfirmSettlement flips to POSTED; MarkSettlementFailed flips to FAILED. Both idempotent. |
| GAP-02 | DisbursementPort.getStatus(ref) exists in EWA + lending ports. EWA + lending unit tests use the new DisbursementResult shape with bankStatus. |
| GAP-03 | EwaStatus + LoanStatus include the new bank states. Transition tables cover every path. Tests for every transition. |
| GAP-04 | LedgerAccountsAdapter no longer returns cashAccountId. Returns the bank-orchestrator taxonomy. Use cases compile against the new shape. |
| GAP-05 | Disburse use cases pre-commit ledger as PENDING. ACCEPTED leaves PENDING. REJECTED reverses. Unit tests cover all three paths. |
| GAP-06 | Outbox events renamed. Old names removed entirely. Notification consumer subscribes to *_settled + *_failed only. |
| GAP-07 | services/integration-gateway is a real gRPC server. Dashen adapter loaded. Integration test against Dashen sandbox proves end-to-end. |
| GAP-08 | Poller running. Stale-pending alert fires after 24h threshold. Stage E2E proves settlement closes. |
| GAP-09 | Webhook endpoint live. HMAC verification per partner. Idempotent replay-safe. |
| GAP-10 | record-repayment.usecase.ts exists. Loan installment flows to PAID. Last installment closes loan. |
| GAP-11 | Bank statement ingestion + matcher + ReconcileWithBank RPC live. Drift dashboard exists. |
| GAP-12 | Legacy Prisma models @deprecated. ESLint blocks new imports. |
Shipping gates (the kill-switches)
Before any of these dates can be considered, gates must clear:
| Gate | What it permits | Required state |
|---|---|---|
| Internal dev demo | Demo to leadership team | S1 + S2 + S3 closed |
| Staging E2E with real Dashen sandbox | Test with internal employees as fake employees | S3 closed + 24h drift-clean |
| First real employer pilot (1 employer, 5 employees, capped at 500 ETB EWA per employee) | First real money moves | S4 closed + 7-day drift-clean + sign-off |
| Multi-employer rollout | Up to 10 employers | After 30 days post-pilot drift-clean + first real reconciliation report |
| Second bank rail (CBE, Awash, Telebirr — pick first) | Diversify partners | At least 90 days operating cleanly on Dashen + second adapter implemented + adapter-test-harness validated |
No bank rail goes to real users without leadership sign-off AND a 7-day soak.
Risk plan
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Dashen API spec arrives late or with breaking ambiguity | Medium | High (S2 blocked) | Start Dashen onboarding in S1 week 1 day 1. If no spec by S1 end, escalate. Treat the integration-gateway Go scaffolding (GAP-07a..e) as still doable without it — build with a fake adapter first. |
| Dashen sandbox is unreliable / down during S2 | Medium | Medium | Build a high-fidelity mock adapter in services/integration-gateway/internal/adapters/mock-dashen/ first. Use it for all unit + integration tests. The real Dashen adapter is the LAST item of S2. |
| The team discovers payroll-deduction integration needs to exist before lending repayment can be tested | Medium | Medium | Mitigated: S3.4 wires a temporary admin endpoint that triggers repayment manually. Sufficient for testing. Real payroll integration is Q3 work (outside this program). |
| Reconciliation reveals real drift on day 1 of soak | High | Critical (go-live blocked) | This IS the point of soak. Plan 2–3 weeks of buffer past S4 for drift hunting. Do not treat go-live date as fixed until first 24h of drift-clean observed. |
| Engineering team turnover during S1–S3 | Low | High | Every gap entry in this file is self-contained. A new engineer can pick up mid-program with SYSTEM_GAP.md + this file + the per-task acceptance criteria. |
| Regulator (NBE) asks for evidence that we are not a custodian during S2–S3 | Low | High | Drop the GAP-12 legacy-deprecation work into S2 — strips the visible Wallet.balance columns. Cite ADR-014 (to be written: "DemozPay is orchestrator, not custodian"). |
What this plan deliberately does NOT include
- Payroll domain implementation. EWA's
accruedEarningsadapter stays as a placeholder. Payroll is a separate program; flagged inSYSTEM_GAP.mdbut not in any S1–S4 sprint. - BNPL. No code exists; nothing to fix. Flagged for a separate program.
- Savings / Equb. Same as BNPL.
- Frontend → API integration. All 5 Next.js apps still use mocks. Out of scope; tracked separately.
- MFA, WebAuthn, AfricasTalking SMS. Auth surface improvements; separate programs.
- ADR-014 ("DemozPay is orchestrator, not custodian"). Recommended as a parallel doc task but explicitly out of this code-execution plan.
What you should do with this file going forward
- Print it. Pin it. Each sprint, the team works from this document.
- At sprint start: team picks the sprint section and breaks tasks down further if needed.
- At sprint end: tick the corresponding rows in
SYSTEM_GAP.md, update this file's sprint table with completion dates. - At go-live gate: print the gate checklist, walk every checkbox, get every sign-off.
The two files are an audit trail. Together they answer "did we do what we said we'd do?" with file:line evidence.
Next step (in this session): start executing Sprint 1, beginning with GAP-01 (ledger pending/posted lifecycle). See the changes landing in your working tree — no commits per standing instruction.