Demoz Pay — Core Banking, Microservices, KYC & Resilience Plan

⚠️ Superseded / historical — archived 2026-05-26. DemozPay is a modular monolith with selective Go extraction (see docs/adr/ADR-001-modular-monolith.md and ADR-010-two-language-ceiling.md) — not microservices-first. The "microservice catalogue" / decomposition sections below do not reflect the current architecture; treat them as exploratory only. Authoritative: docs/adr/ + docs/architecture/restructure-2026-05.md. The Ethiopia banking / FI / wallet integration, KYC, and regulatory research here is still useful reference — which is why this is archived, not deleted.

Author: Engineering · Audience: Demoz Pay engineering + leadership Status: Draft v1 · Last updated: 2026-05-23

This document is the canonical reference for how Demoz Pay will integrate with Ethiopian banks, financial institutions (FIs), and wallet operators; how the platform will be decomposed into microservices; how user identity and KYC will be handled; and how the system will achieve fintech-grade security and near-zero-downtime availability.

Executive summary
Glossary
Current state — what we already have
How core banking actually works in Ethiopia
Target architecture — overview
Microservice catalogue
Bank / FI / Wallet integration layer
User identity & KYC
Security architecture (defence-in-depth)
Data consistency, ledger & idempotency
High availability & zero-downtime deployments
Observability & operations
Regulation & compliance
Migration plan (strangler fig from monolith)
Risk register
Open decisions

1. Executive summary

Demoz Pay is moving from a single-NestJS-monolith + single-Postgres setup toward a domain-driven microservice platform that can plug into many banks, MFIs and wallet operators, ship money safely, and stay up under real load.

The plan rests on five pillars:

A single internal "money API" — every product (payroll, EWA, BNPL, loans, equb) talks to one normalised payments interface. Each bank / wallet / FI sits behind an adapter that translates that interface to the provider's specifics.
Domain-owned services, each with its own database, each reachable only via its public API. Cross-service consistency is enforced through events and sagas, never through direct cross-database writes.
A double-entry ledger (already modelled in Prisma) as the system of record for all value movement. Wallet balances are derived, never authoritative.
Defence in depth for security: network, application, data, identity, secret, and audit layers each enforce policy independently. PII and money never leave Ethiopia (data residency).
Operational discipline: blue-green deploys, expand-contract DB migrations, multi-AZ active-active, OpenTelemetry-instrumented, SLO- driven, with runbooks for every critical path.

We do not rewrite the monolith in one go. We use the strangler fig pattern — new domains are extracted one by one, behind a stable API gateway, with feature flags and shadow traffic. The monolith is deprecated last, not first.

2. Glossary

Term	Definition
Core banking system (CBS)	The transactional engine inside a bank that maintains accounts and posts movements. Common products: Temenos T24/Transact, Oracle Flexcube, Infosys Finacle, Path Solutions iMAL.
ISO 20022	International messaging standard for financial transactions. Replacing SWIFT MT formats globally.
ISO 8583	Older message standard, still used for card/POS transactions.
PSP	Payment Service Provider (Telebirr, M-Birr, HelloCash, Amole, CBE Birr, etc.).
PSD2 / Open Banking	Regulatory frameworks (EU, UK) requiring banks to expose customer-permissioned APIs. NBE has signalled similar in its 2025 directive.
NBE	National Bank of Ethiopia — the regulator.
Fayda	Ethiopia's national digital ID (FCN/National ID Number).
KYC	Know Your Customer — the identity-verification process.
AML / CFT	Anti-Money Laundering / Countering the Financing of Terrorism.
CDD / EDD	Customer Due Diligence (normal) / Enhanced Due Diligence (high-risk).
HSM	Hardware Security Module — tamper-resistant key store.
mTLS	Mutual TLS — both client and server present certificates.
Saga	A distributed transaction pattern using compensating actions.
Outbox	A reliable-events pattern using a DB table as a durable queue.
Strangler fig	Migration pattern: route piece by piece away from a monolith until the monolith is gone.
SLO / SLI	Service Level Objective / Indicator — measurable reliability targets.
RPO / RTO	Recovery Point Objective (acceptable data loss) / Recovery Time Objective (acceptable downtime).

3. Current state — what we already have

3.1 Code

Monorepo (Nx 22, pnpm) with one NestJS server and five Next.js frontend apps (admin, business, client, fi, bnpl-partner) plus a Docusaurus docs site.
Server modules wired in AppModule: auth, business, employee, prisma. That's it — no payment, ledger, lending, BNPL, EWA or equb modules are implemented yet despite being modelled in Prisma.
Single docker-compose.yml with one Postgres instance for the whole monorepo.
No event bus, message broker, cache, secrets manager, observability stack, or HSM provisioned.

3.2 Prisma schema (already strong)

The schema is more advanced than the implementation. It already models:

Identity: User, Role (RBAC), AdminProfile, UserRole enum, 2FA fields, AuditLog.
Business: Business, Department, Employee, EmployeeAbsence.
Double-entry ledger: LedgerAccount, Transaction, JournalEntry (with debit/credit, account types ASSET/LIABILITY/ EQUITY/REVENUE/EXPENSE).
Payroll: Payroll, PayrollEntry, PayrollStatus.
Lending: BusinessLoan, EmployeeLoan, LoanPayment, LoanStatus.
Embedded finance: EarlyWageAccess, BNPLPurchase, BNPLPayment, Equb, EqubMember, EqubPayout.
Money: Wallet, WalletTransaction, WithdrawalRequest, BillPayment, Expense, SavingGoal.
Counterparties: FinancialInstitution, BNPLPartner, Merchant.
Settlement: SettlementBatch, SettlementRecord, SettlementType, SettlementStatus.
Shared enums: KYCStatus, PaymentMethod, WithdrawalMethod.

3.3 Gaps

Capability	Modelled?	Implemented?
RBAC	✓	partial
Bank/wallet adapters	✗	✗
KYC orchestration	enum only	✗
Ledger posting service	✓	✗
Settlement engine	✓	✗
Reconciliation jobs	✗	✗
Event bus	✗	✗
Idempotency layer	✗	✗
Audit immutability (WORM)	log model exists	not WORM
Secrets management	env files	env files
HSM / KMS	✗	✗
Multi-region / multi-AZ	✗	✗
Observability stack	✗	✗

Conclusion: the Prisma schema is a solid blueprint. Implementation is at "zero" for the money paths. This means we don't need to migrate data heavy — we mostly need to build it right the first time while the monolith covers identity and business administration.

4. How core banking actually works in Ethiopia

To integrate "with many banks and many financial institutions", we need a realistic view of what's on the other side of the wire.

4.1 The technology landscape

Provider	Core banking system	Typical integration surface
CBE	Temenos T24 + custom	SOAP / file dropbox / SFTP batches; emerging REST APIs through CBE Birr
Awash, Dashen, Abyssinia	Oracle Flexcube	SOAP web services; sometimes REST gateways through fintech sandboxes
Wegagen, Hibret, NIB	Path Solutions iMAL	SFTP files; some REST via partners
Cooperative Bank of Oromia, Zemen	Infosys Finacle	SOAP; ISO 8583 for card rails
Microfinance (Omo, ACSI, AdCSI)	Various, sometimes Excel + manual	Often portal upload + manual reconciliation
Telebirr	Ethio Telecom — Huawei mobile money	REST API (HMAC + RSA signing); webhook callbacks
M-Birr, HelloCash, Amole, CBE Birr	Various	REST APIs; HMAC + JWT or RSA; mobile money rails
Card networks (where used)	Visa / Mastercard via card processors	ISO 8583 over MPLS; not your usual REST

4.2 The patterns you actually encounter

Synchronous REST/JSON — newest wallet providers and a few "open banking" sandboxes. Easiest, but throughput and uptime vary wildly.
SOAP / XML — legacy core banking. Strong typing via WSDL, but you'll fight with date/timezone formats and exception channels.
SFTP batch files — daily/intra-day file drops (often pipe- delimited or fixed-width). Settlement reports, bulk credit files, reversal files. Many bank disbursement rails still work this way in Ethiopia.
ISO 8583 / ISO 20022 — card/Switch integration (ETHSwitch), high-value transfers (RTGS). You don't speak this directly; you integrate through a processor or the bank's API gateway.
Webhook callbacks — providers push you success/failure updates asynchronously. Always treat as untrusted until verified.

4.3 What you need at the application boundary

For every provider, regardless of the protocol underneath, you need:

Authentication — usually OAuth2 client credentials, HMAC-signed requests, or mutual TLS. Often combined.
Idempotency — every payment send must carry a unique idempotency key that the provider honours. If they don't honour it, we guarantee it on our side by enforcing single-execution semantics on the adapter.
Webhook signing — HMAC over the body with a shared secret, or detached JWS. Reject anything without a valid signature.
Reversal & refund semantics — different per provider. Some support reversals within N hours; some require manual intervention.
Settlement reports — daily files or API endpoints listing everything they consider final. We must reconcile our books to these reports daily before opening for new business each day.

4.4 Operational realities

Bank APIs are not 24/7. Many sandbox/production endpoints are available only during banking hours, with downtime windows for batch jobs.
Webhooks are not reliable. They may not arrive, may arrive twice, may arrive out of order. Always reconcile.
"Same-day settlement" usually means T+1 — same business day if initiated before cut-off, otherwise next business day.
Limits and FX rules change by directive; we must build a config- driven limits engine, not hard-code thresholds.

5. Target architecture — overview

5.1 Logical view

                   ┌────────────────────────────────────────┐
                   │              Edge / WAF                │
                   │  Cloudflare-style: DDoS, bot, rate     │
                   └──────────────────┬─────────────────────┘
                                      │ TLS 1.3
                   ┌──────────────────▼─────────────────────┐
                   │           API Gateway (Kong/Apollo)    │
                   │  AuthN (JWT), AuthZ (OPA), Rate-limit  │
                   └──────────────────┬─────────────────────┘
                                      │ mTLS · internal
       ┌──────────────────────────────┼──────────────────────────────┐
       │                              │                              │
┌──────▼──────┐  ┌────────────┐  ┌────▼─────┐  ┌────────────┐  ┌────▼──────┐
│  Identity   │  │  Business  │  │ Payments │  │  Lending   │  │  Embed.   │
│  & KYC      │  │  Tenancy   │  │  Orches. │  │  (Loans,   │  │  Finance  │
│             │  │            │  │          │  │  EWA)      │  │  (BNPL,   │
│             │  │            │  │          │  │            │  │  Equb)    │
└────┬────────┘  └─────┬──────┘  └────┬─────┘  └─────┬──────┘  └────┬──────┘
     │                 │              │               │              │
     │                 │              ▼               │              │
     │                 │     ┌────────────────┐       │              │
     │                 │     │  Ledger (DBL-  │◄──────┴──────────────┘
     │                 │     │  entry, SoR)   │
     │                 │     └───────┬────────┘
     │                 │             │
     │                 ▼             ▼
     │        ┌────────────┐  ┌──────────────────┐
     │        │  Payroll   │  │ Banking-Adapter  │
     │        │  Engine    │  │ Hub (per-prov.)  │
     │        └────────────┘  └────────┬─────────┘
     │                                  │
     │                                  ├── CBE adapter
     │                                  ├── Telebirr adapter
     │                                  ├── Awash adapter
     │                                  ├── M-Birr adapter
     │                                  └── ...n more
     │
     ▼
┌─────────────────────────────────────────────────────────────────────┐
│  Cross-cutting services (each its own service, called via events):  │
│  Notification · Audit-Log · Reconciliation · Fraud · Compliance ·   │
│  Reporting · Document-Store · Webhook-Receiver                      │
└─────────────────────────────────────────────────────────────────────┘

5.2 Communication patterns

Pattern	When	Why
REST/JSON via gateway	All client ↔ backend traffic	Stable contract, tooled
gRPC + mTLS	Service ↔ service synchronous	Strong typing, low overhead, mTLS by default
Async events (Kafka or NATS JetStream)	Anything that crosses domain boundaries (e.g. "PayrollApproved")	Decoupling, replay, durability
Outbox table → CDC → bus	Whenever a service emits an event after a DB write	Atomicity between state and event
Saga (orchestration)	Multi-service workflows (loan disbursal)	Compensating actions for partial failure
Scheduled jobs (cron/temporal)	Daily settlements, EOD reconciliation	Predictable batches
SFTP / file watchers	Legacy bank rails	Reality of Ethiopian banking

5.3 Data ownership

Each microservice owns its own Postgres schema, not its own database instance (cost-effective at our stage). Logical separation enforced with per-service DB users and row-level security.
No service reads another service's tables. Only via API or event subscription.
Common reference data (countries, currencies, business calendars) lives in a small reference-data service and is cached locally.

5.4 The ledger is the system of record

This is non-negotiable. Wallet balances, loan outstanding balances, EWA available balances — all are derived from the double-entry ledger. We never increment a balance directly. We post a journal entry; the read-side projects balances from journals.

This means:

Bug in projection? Replay from journals.
Audit asks "why is this number what it is"? We show the journals.
Reconciliation breaks? We diff our journals against the bank's file.

6. Microservice catalogue

Each entry lists the service's purpose, the Prisma models it owns, its primary public API surface, the events it publishes, and its direct dependencies.

6.1 Identity & KYC service (`svc-identity`)

Purpose: User registration, authentication, MFA, RBAC, KYC orchestration, sanctions screening, customer profile.
Owns: User, Role, AdminProfile, KYC documents, KYC decisions, 2FA secrets, session tokens.
Public API: POST /users, POST /sessions, POST /kyc/submit, GET /kyc/status, POST /mfa/verify.
Publishes: UserRegistered, KYCSubmitted, KYCApproved, KYCRejected, UserDeactivated.
Dependencies: Document-store (for ID images), Sanctions (Dow Jones / World-Check or local list), Fayda API (national ID verification).
External calls: SMS gateway, email, Fayda, sanctions list.

6.2 Business tenancy service (`svc-business`)

Purpose: Business onboarding, departments, employees, business KYC (KYB), commercial agreements, fee schedules.
Owns: Business, Department, Employee, EmployeeAbsence.
Public API: POST /businesses, POST /departments, POST /employees, GET /businesses/:id/employees.
Publishes: BusinessOnboarded, EmployeeJoined, EmployeeOffboarded.
Dependencies: Identity (to link employees to users), Document- store.

6.3 Ledger service (`svc-ledger`)

Purpose: Authoritative double-entry ledger; the system of record for all value movement.
Owns: LedgerAccount, Transaction, JournalEntry.
Public API: POST /transactions (with full journal entries — must balance), GET /accounts/:id/balance, GET /transactions/ :id, POST /transactions/:id/reverse.
Publishes: TransactionPosted, TransactionReversed, BalanceUpdated.
Invariants: every transaction's debit total = credit total. Rejects unbalanced posts at the DB layer.
No external dependencies.

6.4 Payments orchestration service (`svc-payments`)

Purpose: Single internal "money API". Routes outbound payments to the right adapter; receives webhook callbacks; updates the ledger; enforces idempotency.
Owns: Payment intents, payment attempts, idempotency keys, webhook event log, routing rules.
Public API: POST /payments/intents (with idempotency key), GET /payments/:id, POST /payments/:id/cancel.
Publishes: PaymentInitiated, PaymentSucceeded, PaymentFailed, PaymentReversed.
Dependencies: Banking-adapter hub, Ledger, Risk (fraud), Audit.

6.5 Banking-adapter hub (`svc-bank-adapters`)

Purpose: One adapter per provider; uniform internal contract.
Owns: Provider configs, encrypted credentials, routing rules, per-provider rate limits, circuit breaker state.
Adapters (one per provider):
- cbe-adapter — CBE Birr REST, SFTP for bulk.
- telebirr-adapter — Telebirr REST + webhooks.
- awash-adapter, dashen-adapter, abyssinia-adapter, cooperative-adapter, zemen-adapter, nib-adapter, …
- mbirr-adapter, hellocash-adapter, amole-adapter, …
- sftp-adapter — generic SFTP batch sender for legacy rails.

Common adapter interface (gRPC contract):

Send(idempotencyKey, amount, currency, source, dest, ref) → AttemptId
GetStatus(AttemptId) → Status
Refund(AttemptId, amount, ref) → AttemptId
WebhookHandler(rawBody, signature) → InternalEvent
ListSettlement(fromDate, toDate) → SettlementBatch

Publishes: ProviderPaymentSucceeded, ProviderPaymentFailed, ProviderSettlementReady.
Dependencies: Secrets vault, Network egress to provider endpoints.

6.6 Settlement & reconciliation service (`svc-settlement`)

Purpose: End-of-day batches, reconciliation, dispute resolution workflow.
Owns: SettlementBatch, SettlementRecord, reconciliation diff reports, disputes.
Public API: POST /settlements/run, GET /settlements/:id, GET /reconciliation/diff?date=….
Publishes: SettlementCompleted, ReconciliationBreakFound.
Dependencies: Banking-adapter hub, Ledger, Notifications (when break found, alert finance).

6.7 Payroll service (`svc-payroll`)

Purpose: Payroll cycle management, calculations (PAYE, pension, withholdings), payslip generation, approval workflow, disbursement orchestration.
Owns: Payroll, PayrollEntry.
Public API: POST /payrolls, POST /payrolls/:id/approve, POST /payrolls/:id/disburse, GET /payrolls/:id/payslips.
Publishes: PayrollDrafted, PayrollApproved, PayrollDisbursed.
Dependencies: Business, Identity, Payments, Ledger, Tax-engine (could be embedded initially).

6.8 Lending service (`svc-lending`)

Purpose: Business and employee loans, applications, underwriting decisions, repayment schedules.
Owns: BusinessLoan, EmployeeLoan, LoanPayment.
Public API: POST /loans/apply, POST /loans/:id/disburse, POST /loans/:id/repay.
Publishes: LoanApplied, LoanApproved, LoanDisbursed, LoanRepayment, LoanWrittenOff.
Dependencies: Identity (KYC), Business, Payments (for disbursement and repayment), Ledger, Risk (for underwriting).

6.9 Embedded finance service (`svc-embedded-finance`)

Purpose: Early Wage Access, BNPL, Equb, Bill Payments — features that ride the same money rails but have their own business logic.
Owns: EarlyWageAccess, BNPLPurchase, BNPLPayment, Merchant, Equb, EqubMember, EqubPayout, BillPayment.
Public API: POST /ewa/request, POST /bnpl/purchases, POST /equbs, POST /bills/pay.
Publishes: EWAGranted, BNPLApproved, EqubCycleClosed.
Dependencies: Identity, Lending (for credit decisions), Payments, Ledger, Risk.

6.10 Risk & fraud service (`svc-risk`)

Purpose: Real-time fraud scoring, velocity limits, sanctions re-screening on every transaction, AML pattern detection.
Owns: Rules engine, fraud scores, decision logs, suspicious activity reports (SAR).
Public API: POST /risk/evaluate (returns ALLOW / REVIEW / BLOCK in <100 ms).
Publishes: HighRiskTransactionDetected, SARFiled.
Dependencies: Identity (KYC tier), Ledger (transaction history), Sanctions list.

6.11 Notification service (`svc-notify`)

Purpose: Multi-channel notifications (SMS, email, push, in- app). Templating, localisation (Amharic / English), rate-limiting, preference enforcement.
Owns: Templates, delivery logs, user preferences.
Public API: POST /notifications/send (event-driven, mostly).
Subscribes to: every event that triggers a user-facing message.

6.12 Audit-log service (`svc-audit`)

Purpose: WORM (write-once-read-many) audit log of every privileged action and every money movement. Append-only, integrity- hashed.
Owns: AuditLog (extended with hash chain).
Public API: POST /audit/events (internal only), GET /audit/events (regulators, internal review).
Storage: Hot in Postgres, warm in S3 (immutable bucket with object lock), hash-chained per business per day.
Subscribes to: every domain event.

6.13 Document-store service (`svc-documents`)

Purpose: Encrypted storage for KYC documents, contracts, payslips. Pre-signed download URLs.
Owns: Metadata index; binaries in encrypted object storage (S3-compatible, server-side encryption with KMS keys).
Public API: POST /documents (multipart, virus-scanned), GET /documents/:id/url (short-lived signed URL).

6.14 Webhook receiver (`svc-webhooks`)

Purpose: Public endpoint for provider callbacks. Validates signatures, drops invalid, persists raw payload, hands off to Payments via internal event.
Owns: Raw webhook event log (immutable).
Public API: POST /webhooks/:provider (one path per provider).

6.15 Reporting / read-models (`svc-reports`)

Purpose: Subscribes to all domain events, builds projected read models for dashboards (admin / business / FI / BNPL-partner / client). No business logic; just denormalised views.

6.16 Reference-data service (`svc-reference`)

Purpose: Banks list, branches, BIC codes, currencies, business calendars, holidays, fee schedules, tax tables, regulatory limits. Cached aggressively. Read-mostly.

7. Bank / FI / Wallet integration layer

7.1 The adapter pattern

We define one internal contract:

interface PaymentAdapter {
  /** Initiate an outbound payment */
  send(input: {
    idempotencyKey: string;
    amount: Decimal;
    currency: 'ETB';
    source: {                       // We are paying from…
      accountNumber: string;
      accountName?: string;
    };
    destination: {                  // We are paying to…
      kind: 'BANK_ACCOUNT' | 'WALLET';
      identifier: string;           // account # or wallet phone
      accountName?: string;
      bankCode?: string;            // for BANK_ACCOUNT
    };
    reference: string;              // shows on the customer's statement
    metadata: Record<string, string>;
  }): Promise<{ attemptId: string; status: AttemptStatus }>;

  /** Query status */
  getStatus(attemptId: string): Promise<AttemptStatus>;

  /** Refund / reversal */
  refund(attemptId: string, amount: Decimal, reference: string)
    : Promise<{ refundId: string; status: AttemptStatus }>;

  /** Parse and validate an incoming webhook */
  parseWebhook(rawBody: Buffer, headers: Record<string,string>)
    : Promise<NormalisedProviderEvent>;

  /** Pull settlement data (daily) */
  pullSettlement(date: Date): Promise<NormalisedSettlement>;

  /** Provider-specific health check */
  health(): Promise<HealthResult>;
}

Every adapter implements this. The Payments service never knows the difference between Telebirr and CBE.

7.2 Adapter responsibilities

Auth handshake (OAuth / HMAC / mTLS) → handled per adapter.
Request shaping → translate normalised → provider format.
Response normalisation → translate provider → internal.
Idempotency → the adapter enforces single-execution even if the provider doesn't.
Circuit breaker, retries with exponential back-off + jitter, bulkhead pool isolated per provider.
Provider-specific error mapping to a small internal taxonomy.

7.3 Onboarding a new bank (the cookbook)

Estimate per new bank: 2–4 engineer-weeks depending on protocol.

Legal & commercial — sign integration agreement, get sandbox credentials, agree on cut-off times and limits.
Connectivity — provision IPs to allow-list, set up mTLS or API key vault entry. Confirm latency from our infra.
Implement adapter behind the common interface.
Contract tests in CI against a recorded sandbox.
Shadow mode — for two weeks, route a copy of real payment intents through the new adapter without actually sending, and diff the result with the live adapter.
Canary — route 1% of eligible traffic; expand to 10%, 50%, 100% over a week with manual gating between steps.
Settlement reconciliation — run for ≥ 14 days with zero unresolved breaks before declaring GA.
Runbook — every adapter ships with: cut-off times, contact numbers, common error mappings, rollback procedure.

7.4 Routing rules (when there are multiple providers)

The Payments service picks the adapter at runtime based on:

Destination kind: wallet number → wallet rail; bank account → bank rail.
Bank code or wallet operator prefix (extracted from account number / phone number).
Health & capacity: prefer healthy adapters; downgrade to fall- back when one is degraded.
Cost tiering: cheapest acceptable rail for the amount and speed required.
Cut-off times: if a same-day rail is past its cut-off, fall back to a 24/7 rail (wallet) or queue for next business day.
Customer preference: if the user has pinned a provider.

Routing is config-driven (Prisma JSON on FinancialInstitution and a new Provider table), reviewable in the admin portal, with a full audit history of changes.

8. User identity & KYC

8.1 Registration — what we collect

We use a tiered KYC model — common in mobile money — so that small wallets can open instantly while higher-value services require more verification.

Tier	Limits (illustrative — confirm with NBE)	Required data
Tier 0 — Anonymous browse	No money operations	Phone number, email, password, accepted T&C
Tier 1 — Light wallet	Daily ≤ 5,000 ETB, monthly ≤ 30,000 ETB	+ Full legal name, DOB, gender, Fayda ID, selfie liveness
Tier 2 — Standard	Daily ≤ 30,000, monthly ≤ 300,000	+ Address, occupation, source of funds, Fayda biometric match
Tier 3 — Business	Per agreement	+ Business registration, TIN, beneficial owners, board resolution, signed mandates
Tier 3 — High value	Higher limits	+ Enhanced Due Diligence (EDD), PEP screening, manual review

8.2 Data captured at user registration

Minimum field set in User (extend the schema accordingly):

id (cuid)
phone (E.164, unique, primary identifier in Ethiopia)
email (unique, optional at Tier 0, required at Tier 1+)
password (Argon2id hash, not bcrypt; pepper from KMS)
firstName, middleName, lastName (Tier 1+)
dateOfBirth, gender (Tier 1+)
nationalId (Fayda FCN, unique, encrypted at rest)
idDocumentType (e.g. PASSPORT, KEBELE_ID — fallback if no Fayda)
idDocumentRef (document service ID)
addressLine1, region, subCity, kebele (Tier 2+)
occupation, sourceOfFunds, expectedMonthlyVolume (Tier 2+)
kycStatus (PENDING → IN_REVIEW → VERIFIED / REJECTED)
kycTier (0/1/2/3)
kycSubmittedAt, kycDecisionAt, kycDecisionBy
riskRating (LOW / MEDIUM / HIGH — drives EDD)
pepFlag (Politically Exposed Person)
sanctionsMatchAt (last screen result timestamp)
consents (JSON: T&C version, privacy policy version, marketing opt-in, with timestamps and IP)
createdAt, updatedAt, deletedAt, lastLoginAt, failedLoginCount
Soft-delete (deletedAt) not hard delete — regulators require retention.

For businesses (KYB), capture:

Legal name, trade name, business type, TIN, VAT number, registration number, registration date, registering authority.
Industry classification (NBE category).
Registered address; physical operating address.
Directors / authorised representatives — each with their own user record and Tier 2+ KYC.
Beneficial owners (≥ 25% ownership) — each individually verified.
Bank account(s) for settlement.
Board resolution / mandate authorising platform use.

8.3 Registration flow

Phone +
Email +     ──► OTP sent ──► OTP verified ──► Password set ──► T&C
Password         to phone        to phone       Argon2id        consent
                                                  Tier 0 wallet created (no funds)
                                                  │
                                                  ▼
                                              Identity service
                                              publishes UserRegistered

Tier 1 upgrade:
  Name, DOB, gender, Fayda
  Selfie + liveness (server-side check)
  Fayda biometric match (svc-identity → Fayda API)
  Sanctions screen
  Risk score
  ──► auto-VERIFY (low risk + clean Fayda) or QUEUE for review

Tier 2 upgrade:
  Address proof, occupation, source-of-funds declaration
  Re-screen + risk re-score
  Manual or rule-based decision

Business (Tier 3):
  KYB workflow — admin review mandatory, no auto-approve.

8.4 KYC controls

Document collection through svc-documents. Server-side virus scan (ClamAV in a side-car), file-type sniff (not just extension), EXIF strip, size cap (25 MB).
Liveness — challenge-response with random head movements, server-side analysis. Cap at 3 attempts per 24 hours.
Sanctions screening — daily re-screen of every active user against the platform's curated list (NBE list + UN consolidated + OFAC if relevant).
PEP screening — at onboarding and on update.
Adverse media — at Tier 3 only.
Re-KYC — driven by changes in risk rating, transaction patterns, age of last verification (every 12 months for HIGH risk, 36 months for LOW).
Right to be forgotten — implemented as cryptographic shredding of PII keys (per-user KMS key) plus tombstoning the record. The ledger entries stay (regulatory retention) but are pseudonymised.

8.5 Authentication & session

First factor: phone + password (Argon2id, 64 MB / 3 iterations / parallelism 4).
Second factor: TOTP (always available) + WebAuthn (preferred on supported devices). SMS OTP allowed as fallback but rate- limited and never the only factor for sensitive actions.
Sessions: short-lived access tokens (15 min) + rotating refresh tokens (7 days, single-use, refresh-token reuse triggers global session invalidation).
Step-up auth required for: changing destination accounts, initiating payments > ETB 50k, adding a new beneficiary, disabling 2FA.
Device binding: each device gets a per-device JWK fingerprint, bound to the refresh token.

9. Security architecture (defence-in-depth)

Defence in depth means every layer enforces policy independently. A breach of one does not give the attacker the next.

9.1 Network layer

Cloudflare-class WAF + DDoS at the edge. Custom rules for known attack patterns; bot management; geo-fencing for admin endpoints.
Public ingress only into the API gateway and the webhook receiver. No service is reachable from the internet directly.
Private VPC. Each service in its own subnet. Egress through NAT with a deny-by-default allow-list.
mTLS between every service. Certificates issued by an internal CA with 24-hour rotations; SPIFFE/SPIRE for identity.
Egress allow-list for outbound calls to provider endpoints; logged and alerted on unknown egress attempts.

9.2 Application layer

Input validation at the gateway via JSON schema + business validators in each service (Zod or class-validator).
Output encoding. No HTML in JSON responses. CSP, HSTS, COOP/COEP headers set at the gateway.
No raw SQL anywhere. Prisma everywhere; reviewed for type- unsafe extensions.
Dependency scanning (Snyk / Dependabot) in CI. CVE deadline: Critical = 48 h, High = 7 d, Medium = 30 d.
SAST (Semgrep + ESLint security plugins) in CI.
SCA (Software Composition Analysis) on every build.
Container scanning (Trivy) before image promotion.

9.3 Data layer

Encryption at rest for all data stores. Postgres with TDE- equivalent (filesystem-level + per-column encryption for PII). Object storage encrypted with KMS.
Encryption in transit — TLS 1.3 only on all hops. PSP-level TLS pinning where the provider supports it.
Tokenisation of bank account numbers and Fayda IDs. The raw value is stored only in the Identity / Documents service; every other service references a token.
Per-tenant key isolation — each business has its own data- encryption key, derived from a master key in KMS.
Backups — encrypted, off-region, tested monthly via restore drills.
No PII in logs. Structured logging with a redaction layer that drops fields tagged @sensitive.

9.4 Identity & secret layer

All secrets in HashiCorp Vault (or AWS Secrets Manager). Apps receive short-lived dynamic credentials at boot, refreshed every hour.
Per-service service-account; JIT issuance.
HSM-backed signing keys for: JWT signing, webhook signature verification, payment-message signing.
Break-glass procedure with two-person rule on critical secrets.
Workforce SSO via OIDC for the admin portal; SCIM-managed provisioning; MFA required.

9.5 Audit layer

WORM audit log: every privileged action (transfer, KYC decision, role change, secret access) emits an event to svc- audit. Stored in Postgres + replicated to S3 with Object Lock (governance retention 7 years, regulatory retention).
Hash chain: each daily batch of audit events is Merkle-rooted; the root is published to an internal append-only log signed with the HSM. Tampering is detectable retroactively.
SIEM: stream audit + security events to a SIEM (Wazuh or Datadog Security Monitoring) with playbooks for: unusual login velocity, privileged-action anomalies, payment-routing changes, webhook signature failures.

9.6 Operational layer

All admin actions in production are broken-glass, time-boxed (< 60 min), require ticket reference, and produce a recorded session.
Code that touches money requires two-person review on PR.
Production data access is forbidden by default; granted JIT for a specific ticket, capped at 2 hours, audited.
Backups tested via restore drills monthly; failover drills quarterly.

10. Data consistency, ledger & idempotency

10.1 The double-entry rule

Every value movement is one Transaction with two or more JournalEntry rows whose debits == credits. Examples:

Payroll disbursement of 10,000 ETB to employee E1:

  DR  Payroll Expense                  10,000
  CR  Cash / Bank Settlement Account   10,000

Wallet credit on employee side:

  DR  Cash / Bank Settlement Account   10,000
  CR  Employee Wallet (E1)             10,000

Wallet balance = sum of debits − sum of credits on the wallet account. Read from the projection; reconcile to journals nightly.

10.2 Idempotency

Every state-changing public API requires Idempotency-Key (client-generated UUID, 24-hour TTL). The gateway stores the first response keyed by (client_id, idempotency_key) and returns the same response on retries.

Internal services apply the same rule on their own gRPC interface using a request-ID header.

10.3 Outbox pattern

When a service writes to its DB and must emit an event, it writes the event to an outbox table in the same transaction. A small relay process (Debezium or a custom worker) ships outbox rows to the bus and marks them sent. This guarantees at-least-once delivery without distributed transactions.

Consumers are designed idempotent (event-ID dedupe table) so "at-least-once" is safe.

10.4 Saga (orchestration) for cross-service workflows

Example: Employee loan disbursement.

svc-lending    : create EmployeeLoan (PENDING)
svc-risk       : evaluate → APPROVE
svc-ledger     : reserve funds (post pending journal)
svc-payments   : initiate payment via adapter
(callback)     : svc-payments updates → SUCCESS
svc-ledger     : finalise journal (move from pending to posted)
svc-lending    : mark loan DISBURSED, schedule repayments

If step 4 fails: orchestrator triggers compensating actions (release ledger reservation, mark loan as FAILED, notify customer).

Implementation: Temporal (preferred) or a NestJS saga library. Temporal gives durable execution, retry, and replay for free.

10.5 Reconciliation

Every morning, svc-settlement pulls each provider's daily settlement report, joins to our ledger journals, and produces a diff report:

Matched (✓)
Provider has, we don't → investigate (possible missed webhook)
We have, provider doesn't → investigate (possible double-spend)
Amount mismatch → investigate

Unresolved breaks block opening for new business in that provider channel until cleared. Disputes follow a manual workflow with a tracked audit trail.

11. High availability & zero-downtime deployments

11.1 Availability targets (SLOs)

Capability	Availability target	Latency p99	RPO	RTO
Identity / auth	99.95%	300 ms	1 min	5 min
Payments init	99.95%	500 ms	0 (durable)	5 min
Webhook receipt	99.99%	200 ms	0 (durable)	1 min
Admin portal	99.9%	1 s	5 min	30 min
Reporting	99% (read-only)	2 s	1 hour	4 hours

We don't promise more than the underlying bank rails can deliver on the customer journey. The bank rail's downtime is communicated in-product.

11.2 Topology

Multi-AZ active-active in the primary region (Addis or nearest cloud region with NBE-acceptable data residency).
Postgres: synchronous replication to one in-region standby + async to a DR region. Connection pooling via PgBouncer (transaction pooling) at each service.
Redis: clustered, multi-AZ, primary + replicas; used only for caches and rate-limit counters — never as a source of truth.
Kafka/NATS: 3-node cluster across AZs, replication factor 3, min- in-sync replicas 2.
Object storage: multi-AZ by default; cross-region replication for audit / documents.

11.3 Deploy strategy

Blue-green for stateless services (gateway, all microservices). Cut over via the gateway after the new colour passes synthetic checks.
Canary for risky changes: 1% → 10% → 50% → 100% with auto- rollback on SLO burn.
Feature flags (Unleash or OpenFeature) on every new payment flow, so production exposure can be reduced to zero in seconds.

11.4 Zero-downtime database migrations

Always expand → migrate → contract, never break-and-replace:

Expand — add new columns/tables, nullable, optional. Deploy.
Backfill — populate new columns in chunks with throttling.
Migrate — switch reads/writes to new columns behind a flag.
Validate in production with shadow reads.
Contract — remove old columns in a later release.

Prisma migrations are committed; every migration reviewed for locking behaviour (long lock = forbidden). Tools: pg_repack for table rewrites, pgroll for safer multi-step migrations.

11.5 Graceful shutdown and back-pressure

All services handle SIGTERM: stop accepting new requests, drain in-flight (capped at 30 s), then exit.
Bulkheads — each downstream provider has its own connection pool, so a slow provider can't starve the rest.
Circuit breakers (resilience4j-equivalent) per adapter, per endpoint. Open at 50% errors over 60 s, half-open after 30 s.

11.6 Disaster recovery

DR region with async replication.
Failover tested every quarter (game-day).
Backups (PITR Postgres) every 5 min, retained 30 days hot + 7 years cold.
Documented runbook for every "what if X fails" — region, AZ, Postgres, Kafka, a provider, the gateway.

12. Observability & operations

12.1 The three signals

Logs: structured JSON, correlation ID per request (W3C trace- parent). Loki or OpenSearch. PII-redacted.
Metrics: Prometheus, scraped by service-mesh. Default RED metrics (Rate / Errors / Duration) per endpoint, USE metrics per resource. Custom business metrics: payments initiated, payments succeeded, settlement matched %, KYC decisions per hour.
Traces: OpenTelemetry, sampled at 10% normally, 100% for payments. Backend: Tempo / Honeycomb.

12.2 SLO-driven on-call

Every critical service publishes its SLO (availability + latency) to a dashboard. Burn-rate alerts page on-call when error budget burns > 14× normal.
Pages are rare and respected. If a page is wrong, the postmortem fixes the alert.
Runbooks linked in every alert.

12.3 Synthetic monitoring

Probe the full payment path (init → adapter sandbox → callback → ledger update) every 5 minutes from outside.
Probe each adapter's health() every minute; degrade routing automatically on failure.

12.4 Blameless postmortems

For every customer-impacting incident: written within 5 business days, blameless, ships with at least one action item per root cause. Reviewed in engineering all-hands monthly.

13. Regulation & compliance

We are a fintech in Ethiopia handling money on behalf of others. This is a regulated environment.

13.1 NBE (National Bank of Ethiopia)

Hold the relevant licence for the service category (Payment System Operator, Payment Instrument Issuer, etc. — confirm with legal).
File the directive-required reports: transaction volumes, KYC metrics, suspicious activity reports.
Data residency: customer data and transactional records remain inside Ethiopia unless an explicit exception is granted. Plan infra accordingly (in-country DC or NBE-approved cloud).
Implement and test BCP (Business Continuity Plan) per NBE's expectations.

13.2 PCI DSS

Scope: if any rail involves a card (PAN), even indirectly, PCI applies. Strongly prefer routing all card interactions through a PCI-certified processor so we are out-of-scope or SAQ-A scope.
If in scope: tokenise PANs at the perimeter; never store PAN in our DBs; segment cardholder data environment (CDE) with separate VPC and stricter controls.

13.3 AML / CFT

Documented AML programme with a designated officer.
KYC tiers, transaction monitoring rules, SAR filing workflow.
Customer Risk Rating updated continuously; EDD for HIGH.
Sanctions screening: NBE list + UN consolidated + OFAC where applicable. Daily re-screen of the customer base.
Record retention: 7 years post-relationship.

13.4 Privacy

Lawful basis for processing — contractual necessity (payments) + legal obligation (KYC) + legitimate interest (fraud).
Subject access requests served within 30 days.
Privacy by design and default — KYC documents have a default retention of 7 years post-closure, automatically purged.

13.5 Information security management

Target ISO/IEC 27001 certification within 18 months of launch.
Document and operate to an ISMS — risk register, asset register, access control policy, change management policy, incident response policy, vendor management policy.

14. Migration plan (strangler fig from monolith)

We do not rewrite. We extract.

14.1 Phase 0 — Foundation (4–6 weeks)

Goal: nothing visible to users yet, but the floor is solid.

Provision: VPC, multi-AZ Kubernetes (or Nomad), Postgres HA, Kafka/NATS, Redis, Vault, S3 (or Wasabi), KMS.
Set up CI/CD with: lint, test, SAST, SCA, container scan, signed images (cosign), promotion gates per environment.
Bring up observability stack: Prom / Grafana / Loki / Tempo / alerting.
Add idempotency-key middleware to the existing monolith on every state-changing endpoint.
Add an audit_log outbox + relay in the monolith.
Add outbox and inbox tables to the monolith and emit domain events for the events we know we'll need.

14.2 Phase 1 — Identity & KYC (4–6 weeks)

Goal: identity is owned by a new service; the monolith calls it.

Extract svc-identity with its own Postgres schema; replicate initial user data; switch monolith to call it for auth.
Add the KYC tier model, document store integration, Fayda hook, sanctions screening.
Migrate sessions / refresh tokens to the new service.
Cut over with feature-flag, dual-write during a parallel window, then point all auth traffic to svc-identity.

14.3 Phase 2 — Ledger + Payments (8–10 weeks)

Goal: a single internal money API exists.

Extract svc-ledger. The schema is already there; just stand it up and write the journal-posting service.
Build svc-payments with idempotency, intent → attempt model, webhook ingestion.
Build the adapter framework + two adapters first: Telebirr + CBE (the highest-volume rails).
Build svc-webhooks as the public-facing receiver.
Migrate any existing money flow in the monolith to call svc- payments. Most flows aren't yet implemented, which makes this easier than usual.

14.4 Phase 3 — Settlement & reconciliation (4 weeks)

Build svc-settlement with daily batch + diff reporting.
Operate it in shadow mode (no auto-settle) for 2 weeks; then enable auto-settle once breaks are < 0.05%.

14.5 Phase 4 — Lending, EWA, BNPL, Equb (parallelisable, ~3 months)

Extract each product into svc-lending and svc-embedded- finance. Each one is straightforward once Payments + Ledger exist.

14.6 Phase 5 — Decommission monolith (2 weeks)

The only thing left in the monolith should be the business / employee CRUD. Either keep it as svc-business or fold its remaining responsibilities into svc-identity / svc-business.
Delete the monolith repo's old paths; tag a final image; archive.

14.7 New bank/wallet onboarding cadence

After Phase 3 is shipped, the goal is one new bank/wallet adapter every 2 weeks with the cookbook in §7.3.

15. Risk register

#	Risk	Likelihood	Impact	Mitigation
1	Provider outage during peak	High	Medium	Multi-provider routing, fallback rail, in-product comms
2	Webhook signature secret leak	Low	High	HSM-backed, rotated quarterly, alert on misuse
3	Double-spend through retry	Medium	High	Idempotency keys; ledger reservation step
4	KYC data breach	Low	Critical	Tokenisation, per-tenant keys, no PII in logs, audit log
5	Reconciliation breaks pile up	Medium	High	Daily diff alerts, block channel on > N unresolved
6	DB migration brings down a service	Low	High	Expand-contract only, peer review, staging dress rehearsal
7	Saga compensation incomplete	Medium	High	Temporal-based durable orchestration; chaos test
8	Sanctions match missed	Low	Critical	Daily re-screen, fuzzy match, manual review queue
9	NBE regulatory change	Medium	Variable	Compliance officer monitors directives; config-driven limits
10	Insider misuse	Low	Critical	Two-person rule, JIT access, audit, rotation, anomaly alerts
11	DoS on webhook receiver	Medium	Medium	WAF, rate-limit, queue-back-pressure
12	Key data loss in DR scenario	Low	Critical	Multi-region async + tested restore + immutable audit S3

16. Open decisions

Items that need a call from leadership / architecture before detailed design can start.

Cloud provider — AWS / Azure / local DC? Drives a lot of procurement and the data-residency story.
Service mesh — Istio vs Linkerd vs just mTLS sidecars? Trade operational cost for traffic policy power.
Event bus — Kafka (heavier, more featureful) vs NATS JetStream (lighter, simpler). Recommendation: NATS for now, migrate to Kafka if/when stream-processing demand grows.
Workflow engine — Temporal (recommended) vs custom NestJS sagas. Temporal saves months of engineering.
HSM / KMS — cloud KMS (AWS KMS / Azure Key Vault) vs local HSM. Cloud KMS is enough for v1; revisit for licensed signing requirements.
National ID integration — Fayda API availability and SLA? Backup ID document flow for users without Fayda yet.
Hosting — fully in-country (latency, data residency, harder ops) vs nearest cloud region (faster to ship, possible NBE pushback). Need legal opinion.
Brand — finalise positioning so SEO / public copy stays consistent. (See seo/multi-rail-positioning branch on the landing repo.)

Appendix A — Suggested repo layout (future state)

Demoz-Pay/
├── apps/
│   ├── gateway/                    # API gateway + auth/JWT/RL
│   ├── svc-identity/               # NestJS service
│   ├── svc-business/
│   ├── svc-ledger/
│   ├── svc-payments/
│   ├── svc-bank-adapters/          # one app, many adapter modules
│   ├── svc-settlement/
│   ├── svc-payroll/
│   ├── svc-lending/
│   ├── svc-embedded-finance/
│   ├── svc-risk/
│   ├── svc-notify/
│   ├── svc-audit/
│   ├── svc-documents/
│   ├── svc-webhooks/
│   ├── svc-reports/
│   ├── svc-reference/
│   ├── admin/                      # Next.js frontends — unchanged
│   ├── business/
│   ├── client/
│   ├── fi/
│   ├── bnpl-partner/
│   └── docs/
├── libs/
│   ├── contracts/                  # gRPC .proto + generated types
│   ├── events/                     # Domain event schemas (Avro/JSON)
│   ├── sdk/                        # Internal SDK to call services
│   ├── ui/                         # shared frontend components
│   ├── domain/                     # shared domain types & validators
│   └── observability/              # logger, tracer, metrics helpers
├── infra/
│   ├── terraform/
│   ├── kubernetes/
│   └── runbooks/
└── prisma/                         # per-service schemas live in svc-*

Appendix B — Adapter contract (gRPC)

syntax = "proto3";
package demoz.bankadapter.v1;

service BankAdapter {
  rpc Send(SendRequest) returns (SendResponse);
  rpc GetStatus(GetStatusRequest) returns (AttemptStatus);
  rpc Refund(RefundRequest) returns (RefundResponse);
  rpc PullSettlement(PullSettlementRequest) returns (SettlementBatch);
  rpc Health(google.protobuf.Empty) returns (HealthResponse);
}

message SendRequest {
  string idempotency_key      = 1;
  string amount_minor_units   = 2; // e.g. "1500000" = ETB 15,000.00
  string currency             = 3; // "ETB"
  Source source               = 4;
  Destination destination     = 5;
  string reference            = 6;
  map<string,string> metadata = 7;
}

message Destination {
  enum Kind { BANK_ACCOUNT = 0; WALLET = 1; }
  Kind kind                = 1;
  string identifier        = 2; // acct # or phone (E.164)
  string account_name      = 3;
  string bank_code         = 4;
}
// (… rest elided for brevity, see libs/contracts when implemented …)

Appendix C — Initial telemetry catalogue (must-have on day one)

Metric	Type	Notes
`http_requests_total{service,route,status}`	counter	per-endpoint
`http_request_duration_seconds{service,route}`	histogram	p50/p95/p99
`db_query_duration_seconds{service,query}`	histogram	watch P95
`bus_consumer_lag{consumer,topic}`	gauge	<5 s healthy
`payments_initiated_total{provider}`	counter	per-rail
`payments_succeeded_total{provider}`	counter	per-rail
`payments_failed_total{provider,reason}`	counter	per-rail, per-reason
`webhook_signature_invalid_total{provider}`	counter	alert if >0
`kyc_decision_total{tier,outcome}`	counter	business metric
`settlement_diff_total{provider,bucket}`	counter	matched / missing / extra
`ledger_unbalanced_post_attempts_total`	counter	must be 0

17. Polyglot tech stack — Node + Spring Boot + Go

Microservices win when you can pick the right language for the right problem. NestJS got us to MVP; for the production platform we'll deliberately mix three languages, each chosen for what it's genuinely good at.

17.1 Why polyglot

	Strength	Weakness in our context
Node.js / NestJS	Web-facing services, BFF for our Next.js apps, JSON, fast iteration, shared types with frontend	Single-threaded; GC pauses under sustained heavy throughput; weaker for bank SOAP/ISO 8583
Spring Boot (Kotlin or Java 21)	The Java ecosystem owns banking integration: CXF for SOAP, jPOS for ISO 8583, BouncyCastle for crypto, Drools for rules. Mature observability (Micrometer), threading, transactions	Heavier images, slower startup, more verbose code
Go	Lowest latency + highest throughput per CPU. Excellent gRPC support, goroutines for parallel fan-out, tiny static binaries, fast cold start (matters for K8s scale-out)	Smaller library ecosystem than JVM; not ideal for complex domain rules

17.2 Service-to-language mapping (recommended)

Service	Language	Why
`svc-identity`	NestJS	Already in our stack; mostly CRUD + JWT + integrations
`svc-business`	NestJS	CRUD-heavy; shares types with frontends
`svc-payments`	Go	Highest QPS path; needs predictable latency and tiny memory footprint; the fan-out + idempotency engine maps cleanly to goroutines and channels
`svc-ledger`	Go	Hot write path; deterministic; the journal-posting service must be the fastest service in the platform
`svc-bank-adapters` (hub)	Spring Boot (Kotlin)	Banks speak SOAP, ISO 8583, fixed-width files. jPOS, Apache CXF, Spring Integration, Spring Batch make these trivial in JVM and painful elsewhere
`svc-settlement`	Go	Batch processing of large files; concurrent diff jobs
`svc-payroll`	NestJS	Complex business logic; benefits from shared TS types with the business portal
`svc-lending`	Spring Boot	Underwriting rules engine (Drools), score cards, complex decision trees
`svc-embedded-finance` (EWA, BNPL, Equb)	NestJS	Product features with rich UI parity; co-evolves with the client app
`svc-risk`	Spring Boot	Real-time scoring with Drools; fits with `svc-lending`
`svc-notify`	NestJS	I/O bound (SMS/email/push); rich template ecosystem
`svc-audit`	Go	High-write append-only service; minimal logic; high throughput
`svc-documents`	NestJS	File handling, integrations (ClamAV, S3)
`svc-webhooks`	Go	Public-facing high-throughput endpoint; signature verification; needs to absorb traffic spikes
`svc-reports`	NestJS	Read-side projections; shares types with frontends
`svc-reference`	Go	Read-mostly with aggressive caching; tiny service

This is a recommendation, not a hard rule. A team should not adopt a language just because the table says so. When we extract a service, the team building it picks the language with the engineering lead.

17.3 How three languages stay one platform

Different language, same contract — what makes polyglot survive is uniform contracts and uniform operations.

Contracts as the source of truth

gRPC + Protocol Buffers for service-to-service. The .proto files live in libs/contracts/ and are codegen'd into:
- TypeScript for Node services (ts-proto)
- Kotlin/Java for Spring Boot (protoc-gen-grpc-java)
- Go (protoc-gen-go-grpc)
JSON Schema + AsyncAPI for events on the bus. Codegen'd into each language's event types.
OpenAPI 3.1 for external (gateway-facing) APIs. Codegen'd into client SDKs.

The proto / schema files are reviewed like code — they're the real interface. Implementation can change freely; the contract cannot, except by versioning (v1, v2).

Operations as the great equaliser

Everything below is uniform across languages:

Containers: every service ships as a Docker image; same base hardening rules; same vulnerability scan.
Kubernetes: same Helm chart template; same liveness/readiness/ startup probes; same SIGTERM handling.
Observability: all three languages have first-class OpenTelemetry SDKs for traces + metrics + logs. Service mesh injects sidecars; no language-specific monitoring.
Secrets: Vault Agent sidecar mounts secrets to a tmpfs the app reads at startup. Same for all three languages.
CI/CD: per-language pipeline shape (build, test, scan, sign, promote) but the promotion gates are the same: SAST passes, vulns under threshold, container scan clean, image signed (cosign), staging soak time elapsed.
Auth: same OIDC for human users; same SPIFFE/SPIRE workload identity for service-to-service mTLS — issued by the same internal CA regardless of language.
Logging format: same JSON schema with mandatory fields (trace_id, span_id, service, env, level, msg, attrs). Each language uses its native logger configured to emit this shape.

Language-specific guardrails

Each language gets its own lint / format / test conventions, owned by a small enabling team or a designated maintainer:

Language	Build	Lint	Test	Coverage gate
TypeScript	`pnpm` workspaces, `nx`	ESLint + Prettier	Jest + Playwright	80% lines on `svc-*`
Kotlin	Gradle (Kotlin DSL)	ktlint + detekt	JUnit5 + Testcontainers	80% lines
Go	Go modules per-service	`golangci-lint` (strict)	`go test` + Testcontainers-go	80% lines

Local development

Devs run only the services they're working on, plus a docker- compose.local.yml that provides Postgres, Kafka/NATS, Redis, mock-providers. Other services run as stubbed remote via mock servers (Prism for OpenAPI, protomock for gRPC). Nobody runs the full platform locally.

17.4 When NOT to mix languages

Be deliberate:

Don't pick a language to learn it. Production fintech is not a playground. The team writing a service must already be senior in its language.
Don't fragment the team. If you have one engineer who can write Spring Boot today, that's a bus factor of one. Either hire/upskill before committing the service to JVM, or use a language the team already knows well.
Don't optimise prematurely. A NestJS service can do plenty of throughput. Only move svc-payments and svc-ledger to Go when you have actual profiling data showing Node is the bottleneck.
Don't pick exotic stacks (Rust, Elixir, Zig) — beautiful, but the talent market in Addis is thin and the integration libraries for banking aren't there yet. Java / Node / Go cover us cleanly.

17.5 Hiring & team shape

Role	Approx team size for Phase 1+2
Platform / infra engineer	1 senior
Backend engineers (Node)	2
Backend engineers (Go)	2 senior — for `svc-payments`, `svc-ledger`, `svc-webhooks`, `svc-audit`
Backend engineer (JVM)	1–2 senior — for `svc-bank-adapters`, `svc-lending`, `svc-risk`
Security engineer	1 (can be fractional initially)
QA / SDET	1
SRE / on-call	1 (rotating with backend engineers initially)

Cross-training plan: every backend engineer should be able to read all three languages within 6 months. Writing them is optional.

18. Security parameters — what the bank questionnaire will ask

When a bank, MFI, or wallet operator gives us their security questionnaire (and they all do), this is the answer set. We maintain a separate, bank-handover-ready version of this in SECURITY_CONTROLS.md — the file is structured Q&A so it can be copy-pasted into vendor questionnaires with minimal editing.

The headline controls we'll be asked about, and our position:

Domain	Control	Position
Encryption at rest	AES-256-GCM, KMS-managed keys, per-tenant DEKs, automatic rotation	Built-in
Encryption in transit	TLS 1.3 only, modern cipher suites only, HSTS + HPKP where applicable	Enforced
Authentication (user)	Phone + Argon2id + mandatory MFA (TOTP / WebAuthn)	Implemented
Authentication (service-to-service)	mTLS via SPIFFE/SPIRE, short-lived (24 h) certs	Enforced
Authorization	RBAC + OPA policy engine; least privilege	Enforced
Webhook signing	HMAC-SHA256 or detached JWS; constant-time verify; nonce + timestamp	Mandatory
Idempotency	UUIDv4 keys, 24 h TTL, response cached	Built-in
Audit log	WORM (Postgres + S3 Object Lock), 7-year retention, Merkle-rooted daily, SIEM-streamed	Implemented
Secrets	HashiCorp Vault + dynamic per-service secrets, no env files in production	Mandatory
Vulnerability mgmt	SAST + SCA + container scan in CI; Critical 48 h, High 7 d SLA	Process
Penetration testing	External independent test pre-launch and annually; segment-rescoped on major change	Scheduled
Bug bounty / responsible disclosure	`security.txt` + private programme via HackerOne or local equiv	Planned
Network	WAF + DDoS at edge, private VPC, no service public-by-default, egress allow-list	Architecture
Data residency	Customer + transaction data resident in Ethiopia (cloud region or in-country DC)	Policy
Backup & recovery	5-min Postgres PITR, 30-d hot, 7-y cold, monthly restore drill	Tested
BCP / DR	Multi-AZ active-active, DR region async, quarterly failover drill, RPO 1 min / RTO 5 min on critical	Tested
Incident response	Documented playbook, 15-min ack SLA for SEV1, post-mortem within 5 business days	Process
Change management	All prod changes via PR + 2 reviewers + signed image promotion + canary	Enforced
Vendor / third-party risk	Vendor security review (signed DPA, SOC 2 if available), annual re-review	Process
Compliance roadmap	ISO 27001 within 18 months; SOC 2 Type II at scale; PCI DSS scope minimised through certified processors	Roadmap
Right to audit	Per contract, with reasonable notice	Available

For the detailed control-by-control answers banks expect (the ones that fit in their spreadsheet), see SECURITY_CONTROLS.md.

Sign-off

This plan is a draft. Before we commit engineering time, the following sign-offs are needed:

Engineering lead — architecture, sequencing
Security lead — threat model, controls
Compliance / legal — NBE alignment, AML programme, data residency
Product — phase ordering vs commercial commitments
Finance — infra and tooling budget

Once signed off, this document becomes the canonical reference and every architectural deviation requires a one-page RFC referencing the section it changes.

Table of contents
1. Executive summary
2. Glossary
3. Current state — what we already have
- 3.1 Code
- 3.2 Prisma schema (already strong)
- 3.3 Gaps
4. How core banking actually works in Ethiopia
- 4.1 The technology landscape
- 4.2 The patterns you actually encounter
- 4.3 What you need at the application boundary
- 4.4 Operational realities
5. Target architecture — overview
- 5.1 Logical view
- 5.2 Communication patterns
- 5.3 Data ownership
- 5.4 The ledger is the system of record
6. Microservice catalogue
- 6.1 Identity & KYC service (svc-identity)
- 6.2 Business tenancy service (svc-business)
- 6.3 Ledger service (svc-ledger)
- 6.4 Payments orchestration service (svc-payments)
- 6.5 Banking-adapter hub (svc-bank-adapters)
- 6.6 Settlement & reconciliation service (svc-settlement)
- 6.7 Payroll service (svc-payroll)
- 6.8 Lending service (svc-lending)
- 6.9 Embedded finance service (svc-embedded-finance)
- 6.10 Risk & fraud service (svc-risk)
- 6.11 Notification service (svc-notify)
- 6.12 Audit-log service (svc-audit)
- 6.13 Document-store service (svc-documents)
- 6.14 Webhook receiver (svc-webhooks)
- 6.15 Reporting / read-models (svc-reports)
- 6.16 Reference-data service (svc-reference)
7. Bank / FI / Wallet integration layer
- 7.1 The adapter pattern
- 7.2 Adapter responsibilities
- 7.3 Onboarding a new bank (the cookbook)
- 7.4 Routing rules (when there are multiple providers)
8. User identity & KYC
- 8.1 Registration — what we collect
- 8.2 Data captured at user registration
- 8.3 Registration flow
- 8.4 KYC controls
- 8.5 Authentication & session
9. Security architecture (defence-in-depth)
- 9.1 Network layer
- 9.2 Application layer
- 9.3 Data layer
- 9.4 Identity & secret layer
- 9.5 Audit layer
- 9.6 Operational layer
10. Data consistency, ledger & idempotency
- 10.1 The double-entry rule
- 10.2 Idempotency
- 10.3 Outbox pattern
- 10.4 Saga (orchestration) for cross-service workflows
- 10.5 Reconciliation
11. High availability & zero-downtime deployments
- 11.1 Availability targets (SLOs)
- 11.2 Topology
- 11.3 Deploy strategy
- 11.4 Zero-downtime database migrations
- 11.5 Graceful shutdown and back-pressure
- 11.6 Disaster recovery
12. Observability & operations
- 12.1 The three signals
- 12.2 SLO-driven on-call
- 12.3 Synthetic monitoring
- 12.4 Blameless postmortems
13. Regulation & compliance
- 13.1 NBE (National Bank of Ethiopia)
- 13.2 PCI DSS
- 13.3 AML / CFT
- 13.4 Privacy
- 13.5 Information security management
14. Migration plan (strangler fig from monolith)
- 14.1 Phase 0 — Foundation (4–6 weeks)
- 14.2 Phase 1 — Identity & KYC (4–6 weeks)
- 14.3 Phase 2 — Ledger + Payments (8–10 weeks)
- 14.4 Phase 3 — Settlement & reconciliation (4 weeks)
- 14.5 Phase 4 — Lending, EWA, BNPL, Equb (parallelisable, ~3 months)
- 14.6 Phase 5 — Decommission monolith (2 weeks)
- 14.7 New bank/wallet onboarding cadence
15. Risk register
16. Open decisions
Appendix A — Suggested repo layout (future state)
Appendix B — Adapter contract (gRPC)
Appendix C — Initial telemetry catalogue (must-have on day one)
17. Polyglot tech stack — Node + Spring Boot + Go
- 17.1 Why polyglot
- 17.2 Service-to-language mapping (recommended)
- 17.3 How three languages stay one platform
- 17.4 When NOT to mix languages
- 17.5 Hiring & team shape
18. Security parameters — what the bank questionnaire will ask
Sign-off

Table of contents​

1. Executive summary​

2. Glossary​

3. Current state — what we already have​

3.1 Code​

3.2 Prisma schema (already strong)​

3.3 Gaps​

4. How core banking actually works in Ethiopia​

4.1 The technology landscape​

4.2 The patterns you actually encounter​

4.3 What you need at the application boundary​

4.4 Operational realities​

5. Target architecture — overview​

5.1 Logical view​

5.2 Communication patterns​

5.3 Data ownership​

5.4 The ledger is the system of record​

6. Microservice catalogue​

6.1 Identity & KYC service (svc-identity)​

6.2 Business tenancy service (svc-business)​

6.3 Ledger service (svc-ledger)​

6.4 Payments orchestration service (svc-payments)​

6.5 Banking-adapter hub (svc-bank-adapters)​

6.6 Settlement & reconciliation service (svc-settlement)​

6.7 Payroll service (svc-payroll)​

6.8 Lending service (svc-lending)​

6.9 Embedded finance service (svc-embedded-finance)​

6.10 Risk & fraud service (svc-risk)​

6.11 Notification service (svc-notify)​

6.12 Audit-log service (svc-audit)​

6.13 Document-store service (svc-documents)​

6.14 Webhook receiver (svc-webhooks)​

6.15 Reporting / read-models (svc-reports)​

6.16 Reference-data service (svc-reference)​

7. Bank / FI / Wallet integration layer​

7.1 The adapter pattern​

7.2 Adapter responsibilities​

7.3 Onboarding a new bank (the cookbook)​

7.4 Routing rules (when there are multiple providers)​

8. User identity & KYC​

8.1 Registration — what we collect​

8.2 Data captured at user registration​

8.3 Registration flow​

8.4 KYC controls​

8.5 Authentication & session​

9. Security architecture (defence-in-depth)​

9.1 Network layer​

9.2 Application layer​

9.3 Data layer​

9.4 Identity & secret layer​

9.5 Audit layer​

9.6 Operational layer​

10. Data consistency, ledger & idempotency​

10.1 The double-entry rule​

10.2 Idempotency​

10.3 Outbox pattern​

10.4 Saga (orchestration) for cross-service workflows​

10.5 Reconciliation​

11. High availability & zero-downtime deployments​

11.1 Availability targets (SLOs)​

11.2 Topology​

11.3 Deploy strategy​

11.4 Zero-downtime database migrations​

11.5 Graceful shutdown and back-pressure​

11.6 Disaster recovery​

12. Observability & operations​

12.1 The three signals​

12.2 SLO-driven on-call​

12.3 Synthetic monitoring​

12.4 Blameless postmortems​

13. Regulation & compliance​

13.1 NBE (National Bank of Ethiopia)​

13.2 PCI DSS​

13.3 AML / CFT​

13.4 Privacy​

13.5 Information security management​

14. Migration plan (strangler fig from monolith)​

14.1 Phase 0 — Foundation (4–6 weeks)​

14.2 Phase 1 — Identity & KYC (4–6 weeks)​

14.3 Phase 2 — Ledger + Payments (8–10 weeks)​

Table of contents