DemozPay — 90-Day Execution Plan
⚠️ STALE SNAPSHOT (2026-05-29). Overtaken by work since shipped — the payroll engine, banking, and polymorphic multi-org are now Live, so statements below like "none of the work has been executed" and "no payroll engine" reflect the 2026-05-29 baseline, not current state. For what's actually built, see
../audits/CURRENT_STATE_AUDIT.mdand../plans/TARGET_ARCHITECTURE_ALIGNMENT_PLAN.md. Re-baseline or archive this doc.Snapshot: 2026-05-29 → 2026-08-27 Companion to:
GO_LIVE_BLOCKERS.md,PRODUCTION_READINESS.md,REAL_SYSTEM_STATE.md.Purpose: sequence the 22 go-live blockers + payroll + frontend-integration so that the platform is realistically pilot-ready in 90 days. Replaces "ship after S4.6" with a calibrated path.
Premises
- Team assumed: 3 backend engineers, 1 frontend engineer, 1 operations/SRE engineer. Fewer = stretch the timeline proportionally.
- No commits today (standing instruction). The work below assumes commits resume once the user reviews + approves the audit.
- Telemedhin infrastructure stays untouched throughout (standing instruction).
- No real bank rail opens until the 19 pilot-blockers in
GO_LIVE_BLOCKERS.mdare green.
Phase shape
Three 30-day phases:
- Phase A — Days 1–30 — "Stop the bleeding" (close BLACK items, ship Authz/PII/LookupAccount).
- Phase B — Days 31–60 — "Production hardening" (observability, secrets, TLS, backups, recon cadence).
- Phase C — Days 61–90 — "Pilot enablement" (KYC + sanctions + sign-off + the one integrated frontend + payroll-domain skeleton).
Soak + sign-off + first pilot enrolment happen at the end of Phase C.
Phase A — Days 1–30 — Stop the bleeding
Theme: every BLACK item in PRODUCTION_READINESS.md closes. Authz model becomes real. Money model gets its missing primitives. PII no longer leaks.
Sprint A1 — Week 1 (Days 1–5) — Authz emergency
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 1 | GL-01 — validate x-actor-id | Backend-A | EWA + lending controllers reject mismatched header; unit tests cover spoof. |
| 1 | GL-02 (start) — tenant-scope Business + Employee | Backend-B | Spike: design the membership-based scoping query. |
| 2 | GL-02 (continue) — implement | Backend-B | Member-table join filters all CRUD; cross-tenant E2E test proves zero rows. |
| 3 | GL-02 (test + ship) | Backend-B | All 9 employer/employee endpoints proven cross-tenant safe. |
| 3 | GL-03 — PII redaction (start) | Backend-C | Pino redaction config drafted. |
| 4 | GL-03 (test + ship) | Backend-C | Unit tests prove redaction for nationalId, phone, email, password, idempotencyKey, x-demoz-signature. |
| 5 | Hardening pass — security-review the diff | All | All three changes peer-reviewed; merge. |
Exit: the three BLACK AuthN/Z findings are LIVE. Audit logs are trustworthy. Cross-tenant enumeration impossible.
Sprint A2 — Week 2 (Days 6–10) — Money model gap-closures
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 6–7 | GL-04 (LookupAccount in Dashen adapter + use-case wiring) | Backend-B (Go) | Dashen LookupAccount real; EWA + lending disburse fail-fast on account mismatch; bank-sandbox covers both happy + mismatch. |
| 8 | GL-05 — EWA repayment use-case (admin endpoint variant) | Backend-A | RecordEwaRepaymentUseCase; admin endpoint; 5 unit tests; idempotent on (ewaRequestId). |
| 9–10 | GL-06 — PayrollClearing → FI remittance for repaid installments | Backend-A + Backend-B | New RemitInstallmentToFiUseCase; per-installment outbound transfer via gateway; integration test against bank-sandbox. |
Exit: the money model has no dead-ends. EWA can be repaid. Loans can be remitted. Disbursements verify the account.
Sprint A3 — Week 3 (Days 11–15) — Operational baselines
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 11–12 | GL-11 — Prometheus on Go services | Backend-B | Counters + histograms per RPC in ledger + gateway; /metrics endpoint live. |
| 13 | GL-07 — idempotency-record TTL cleanup | Backend-C | Daily cron + counter + runbook. |
| 14–15 | Build the daily reconciliation harness — start of GL-09 | Backend-C | Cron skeleton calls reconciliation Runner end-to-end against test data; not yet wired to prod. |
Exit: the Go services are observable. The recon cron skeleton exists.
Sprint A4 — Week 4 (Days 16–20) — ADR-014 + sign-off draft
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 16 | GL-15 — write ADR-014 ("DemozPay is orchestrator, not custodian") | Engineering lead + legal | Draft circulated. |
| 17–18 | GL-21 — write reconciliation-daily-process.md runbook | Engineering lead + finance lead | Draft circulated. |
| 19 | GL-22 — sign-off matrix template | Engineering lead | One-page form, all 22 blockers listed. |
| 20 | Phase A retrospective | All | Lessons captured; Phase B plan locked. |
Exit of Phase A:
- 7 of 22 go-live blockers closed (GL-01..07 + ADR-014 draft).
- The platform is no longer dangerous to ship — but is not yet ready.
- Burn rate: ~50% of total 90-day budget on the highest-risk items, deliberately.
Phase B — Days 31–60 — Production hardening
Theme: observe, alert, deploy, encrypt, back up. The boring stuff that turns "running code" into "operable system".
Sprint B1 — Week 5 (Days 31–35) — Alerting backbone
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 31 | GL-08 (start) — Alertmanager / paging provider chosen | SRE | Decision recorded. Probably Grafana OnCall + on-call rota in PagerDuty or similar. |
| 32–33 | Alertmanager / Grafana deployed; scrape config covers API + Go services | SRE | /metrics scraped; dashboards built. |
| 34 | First 5 alert rules authored | SRE + Backend-A | Webhook-failure, gateway-down, ledger-down, outbox-stale, poller-error. |
| 35 | Drift = non-zero alert rule (depends on GL-09 finishing first) | SRE | Drift rule, linked runbook. |
Exit: someone gets paged when something breaks.
Sprint B2 — Week 6 (Days 36–40) — Reconciliation cadence + statement pull
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 36–37 | GL-10 — Dashen statement-pull adapter (SFTP or API) | Backend-B | Daily cron pulls yesterday's file; ingests; produces a "ingest success" counter. |
| 38 | GL-09 — wire the daily-recon cron to call ReconcileWithBank per (tenant, account) | Backend-C | Cron schedules daily; output → Slack channel + drift counter. |
| 39–40 | GL-12 — adapter health surfacing | Backend-B | GetAdapterStatus real; degraded threshold; surfaced to API. |
Exit: drift is reported daily, automatically, to a channel that humans watch.
Sprint B3 — Week 7 (Days 41–45) — Secrets + TLS
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 41 | GL-16 — secrets-manager decision | SRE + Engineering lead | Decision recorded (likely AWS SM if the cloud is AWS; Vault if self-hosted). |
| 42–44 | Secrets pipeline integrated; init-container or sidecar pattern; rotation runbook | SRE | Application reads secrets at boot; no .env in any environment after this date. |
| 45 | GL-17 — TLS at the edge | SRE | Ingress terminates TLS 1.3; HSTS; cert renewal automated. |
Exit: secrets are managed; traffic is encrypted.
Sprint B4 — Week 8 (Days 46–50) — Backups + CI hardening
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 46 | GL-18 (start) — backup strategy decision | SRE | Decision recorded. |
| 47–48 | PITR enabled per database (API, ledger, gateway) | SRE | Backups configured + monitored. |
| 49 | Restore drill | SRE + Backend-A | Test database restored from backup; recent transaction verified. RTO + RPO documented. |
| 50 | GL-19 — secret-leak + dep-vuln scanning in CI | SRE | gitleaks + Snyk merged into the GitHub Actions workflow. |
Exit: the system can survive a disk loss. CI catches accidentally-committed secrets.
Sprint B5 — Week 9 (Days 51–55) — On-call rota + drill
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 51–52 | GL-20 — on-call rota named; tooling configured | Engineering lead | 3 engineers signed up; weekly handoff documented. |
| 53 | First tabletop drill: webhook-failure.md | All on-call | Drill executed; postmortem written. |
| 54 | Second drill: drift-detected.md | All on-call | Drill executed; postmortem written. |
| 55 | Incident commander + scribe roles documented | Engineering lead | Documented; trial-run completed. |
Exit: someone knows what to do when the page fires.
Sprint B6 — Week 10 (Days 56–60) — Phase B catch-up + retrospective
Buffer. There will be slippage. Use this week to absorb it. Anything still open at end of Day 60 becomes a Phase C carry-over.
Exit of Phase B:
- 15 of 22 go-live blockers closed (GL-01..18 + GL-19..21).
- The platform is operationally hardened.
- Ready for the compliance + identity work in Phase C.
Phase C — Days 61–90 — Pilot enablement
Theme: the things partners and regulators require, plus one integrated frontend, plus the payroll-domain skeleton that unblocks scale beyond the pilot.
Sprint C1 — Week 11 (Days 61–65) — KYC + sanctions
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 61–63 | GL-13 — KYC primitive | Backend-A + Frontend | packages/kyc/ skeleton; capture (nationalId, photo, document); tenant-scoped table; outbox event; manual-review workflow. |
| 64–65 | GL-14 — sanctions screening (enum-list pilot tier) | Backend-B | Pre-disburse check against OFAC + UN + Ethiopia list (CSV ingested daily); hit halts disbursement + alerts. |
Exit: identity verification + sanctions screening exist at pilot tier.
Sprint C2 — Week 12 (Days 66–70) — Frontend integration: employer-web
This is the strategic frontend bet. Employer-web is the customer that pays. Integrate it first; the other four can wait.
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 66 | Replace mock data with real API calls | Frontend | Login + dashboard + employee list + payroll view. |
| 67–68 | Wire the EWA + lending disburse approval flows | Frontend | Employer admin can approve / reject pending EWAs and loans from the UI. |
| 69 | Wire the admin endpoints for EWA repayment + loan repayment recording | Frontend | Manual repayment recording is operable from the UI (not psql). |
| 70 | E2E test in CI | Frontend + SRE | One full employer-web E2E test passes against a deployed staging. |
Exit: the employer experience is real, not theatre.
Sprint C3 — Week 13 (Days 71–75) — Payroll-domain skeleton
Cannot ship a full payroll engine in 5 days. Ship a skeleton that the pilot can use for one or two manual payroll runs while we build out the rest.
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 71 | Domain package packages/payroll/ created | Backend-A | Standard 4-layer scaffold. |
| 72–73 | Payroll-run aggregate: state machine, deduction calculator (consume EWA + loan-installment open balances) | Backend-A | Unit tests. |
| 74 | Bulk-disbursement endpoint: 1 transfer per employee via gateway | Backend-A + Backend-B | Idempotent; fan-out across the gateway. |
| 75 | Payroll-event consumer in EWA + Lending | Backend-C | Consumes payroll.deductions_taken.v1; routes to RecordEwaRepaymentUseCase / RecordRepaymentUseCase. |
Exit: payroll is no longer the bottleneck for repayment automation. Pilot can run one or two payroll cycles end-to-end.
Sprint C4 — Week 14 (Days 76–80) — Soak preparation + sign-off form
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 76–77 | End-to-end staging deploy: API, ledger, gateway, bank-sandbox (or Dashen sandbox) | SRE | Staging mirrors production topology. |
| 78 | Pilot data seeded: 1 employer, 5 fake employees, 1 FI partner | Backend-A | Seed script. |
| 79 | Soak begins — Day 1 | All | Daily-recon cron runs; drift = 0 observed. |
| 80 | GL-22 — sign-off form populated | Engineering lead | Each of 19 pilot-blocker items has verifier + date + evidence link. |
Sprint C5 — Week 15 (Days 81–85) — Soak continues
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 81–85 | Soak days 3–7 | All on-call | Daily drift = 0; no incidents. |
Sprint C6 — Week 16 (Days 86–90) — Sign-off + first pilot enrolment
| Day | Item | Owner | Acceptance |
|---|---|---|---|
| 86 | Soak day 7 — green light decision | All leads | If drift was 0 all 7 days AND all 19 pilot-blockers signed off → green light. |
| 87 | First pilot employer onboarded (1 employer, ≤ 5 employees) | All | First real disbursement. |
| 88–90 | Buffer + incident-response standby | On-call | Watching closely for anomalies. |
Exit of Phase C:
- 19 pilot-blockers closed + signed off.
- 7-day soak clean.
- First pilot employer is on the platform with real money flowing.
- Three Phase-C-only items remain unfinished and are deliberately deferred: BNPL, Savings, Equb, advanced fraud, mobile apps.
Parallel tracks not on the critical path
These run alongside the sprints above. They don't block pilot but should not be ignored.
| Track | Owner | Notes |
|---|---|---|
| Delete legacy Prisma models | Backend-C | Once nothing reads Wallet*, BNPLPurchase, Payroll*, Equb*, SavingGoal, BillPayment, Expense — drop the tables in a forward migration. Do this in Phase B once payroll-skeleton has its own clean schema. (BNPLPartner already collapsed into Merchant.) |
services/notifications Live | Backend-B (Go) | Pick Africa's Talking; wire SMS as the first channel; consume *_settled + *_failed outbox events. Phase B effort. |
| Event catalog doc | Engineering lead | docs/architecture/event-catalog.md — promised in ADR-011 follow-ups. Phase B. |
| Admin tooling MVP | Backend-A | A apps/api/src/admin/ module with: replay-webhook endpoint, force-resync disbursement, view ledger account, view outbox event. Phase B. |
Decisions to make at the start of Phase A
These are the choices that the user / leadership needs to make before the sprint starts. Each is a 1-line decision:
- Who's the pilot employer? (Determines KYC tier-1 data shape.)
- Which partner bank do we open the rail with first? (Determines GL-10 adapter implementation.)
- Which paging provider? (Determines GL-08 cost + integration shape.)
- Which secrets manager? (Determines GL-16 ops cost.)
- Which cloud (or self-host)? (Determines GL-18 backup strategy.)
- Who's the security lead? (Signs the sign-off form.)
- Who's the finance/ops lead? (Signs the sign-off form.)
Without these decisions on Day 1, Phase A slips into Phase B.
What this plan deliberately does NOT include
- BNPL implementation. No code today. Out of 90-day scope. Plan as a Q4 program.
- Savings + Equb. Same. Q1 2027 conversation.
- Mobile apps. Web is sufficient for pilot. Quarter after pilot.
- Advanced fraud detection (behavioural analytics, device fingerprinting). Pilot uses static velocity ceilings + sanctions enum. Q4.
- Frontend integration of admin-web, employee-web, fi-web, merchant-web. Employer-web first; the others sequentially over Q4–Q1 2027.
- Real-time event consumers beyond payroll. BI pipeline, fraud-event consumer, regulator-reporting consumer — all Q4.
- Multi-currency settlement. Schema supports it; pilot is ETB-only.
- Multi-region DR. Single-region in pilot; multi-region is Q2 2027.
Risk register for this plan
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Authz fixes (GL-01..03) reveal additional missing checks | High | Medium | Sprint A1 has buffer; treat the first finding as the leading indicator and look for siblings. |
| LookupAccount delivery requires Dashen sandbox cooperation | Medium | High (blocks GL-04) | Start Dashen-side spec conversation Day 1 of Phase A. |
| Statement-pull (GL-10) needs partner SFTP onboarding | Medium | High | Same — start the partner conversation Phase A Week 1. |
| Alerting (GL-08) takes longer than 5 days | High | Medium | Sprint B6 is the buffer; pull from there. |
| 7-day soak surfaces real drift | High | Critical (blocks pilot) | This IS the point of soak. Plan 2 weeks buffer past Day 90 for drift hunting. Do NOT treat Day 90 as a fixed pilot date. |
| Team turnover during the 90 days | Low | Critical | Each task in this plan is self-contained; a new engineer can pick up mid-program with the linked architecture docs. |
| Regulator (NBE) asks for evidence before pilot | Medium | High | Phase A Day 16: write ADR-014. Phase C: KYC + sanctions. Have artefacts ready for the first NBE conversation. |
Outcomes at Day 90
If this plan executes:
- 19 of 22 go-live blockers closed.
- 7-day soak completed on staging.
- 1 employer + ≤ 5 employees in production with real money flowing.
- All 4 incident runbooks battle-tested in at least one drill.
- A daily reconciliation cadence reporting drift = 0 in production.
- An employer-web frontend that real customers can use.
- A payroll-domain skeleton that enables scaling past pilot.
If this plan slips by 30 days (which is realistic for a 5-person team):
- Pilot lands at Day 120 instead.
- Phase C "parallel tracks" all drop off.
- BNPL conversation moves to Q1 2027.
If this plan slips by 60 days:
- Re-set expectations with leadership / investors / partners. Don't fake it.
What the user has asked me to do (and what they have not)
The user asked for a deep architectural audit + 8 deliverable documents. This is the eighth and final document. None of the work above has been executed. Files in working tree from prior sessions remain unstaged per the standing instruction. The audit recommends the work; the user decides which (if any) to authorise.
Cross-references
- The blockers this plan sequences →
GO_LIVE_BLOCKERS.md. - Why each blocker matters →
PRODUCTION_READINESS.md. - Domain-by-domain readiness →
DOMAIN_COMPLETENESS_MATRIX.md,REAL_SYSTEM_STATE.md. - Money flows that need closing →
MONEY_FLOWS.md. - Reconciliation cadence to be built →
RECONCILIATION_ARCHITECTURE.md. - The bank-orchestrator commitment that grounds everything →
BANK_ORCHESTRATION.md.