Skip to main content

Demoz Pay — Core Banking, Microservices, KYC & Resilience Plan

⚠️ Superseded / historical — archived 2026-05-26. DemozPay is a modular monolith with selective Go extraction (see docs/adr/ADR-001-modular-monolith.md and ADR-010-two-language-ceiling.md) — not microservices-first. The "microservice catalogue" / decomposition sections below do not reflect the current architecture; treat them as exploratory only. Authoritative: docs/adr/ + docs/architecture/restructure-2026-05.md. The Ethiopia banking / FI / wallet integration, KYC, and regulatory research here is still useful reference — which is why this is archived, not deleted.

Author: Engineering · Audience: Demoz Pay engineering + leadership Status: Draft v1 · Last updated: 2026-05-23

This document is the canonical reference for how Demoz Pay will integrate with Ethiopian banks, financial institutions (FIs), and wallet operators; how the platform will be decomposed into microservices; how user identity and KYC will be handled; and how the system will achieve fintech-grade security and near-zero-downtime availability.


Table of contents

  1. Executive summary
  2. Glossary
  3. Current state — what we already have
  4. How core banking actually works in Ethiopia
  5. Target architecture — overview
  6. Microservice catalogue
  7. Bank / FI / Wallet integration layer
  8. User identity & KYC
  9. Security architecture (defence-in-depth)
  10. Data consistency, ledger & idempotency
  11. High availability & zero-downtime deployments
  12. Observability & operations
  13. Regulation & compliance
  14. Migration plan (strangler fig from monolith)
  15. Risk register
  16. Open decisions

1. Executive summary

Demoz Pay is moving from a single-NestJS-monolith + single-Postgres setup toward a domain-driven microservice platform that can plug into many banks, MFIs and wallet operators, ship money safely, and stay up under real load.

The plan rests on five pillars:

  1. A single internal "money API" — every product (payroll, EWA, BNPL, loans, equb) talks to one normalised payments interface. Each bank / wallet / FI sits behind an adapter that translates that interface to the provider's specifics.
  2. Domain-owned services, each with its own database, each reachable only via its public API. Cross-service consistency is enforced through events and sagas, never through direct cross-database writes.
  3. A double-entry ledger (already modelled in Prisma) as the system of record for all value movement. Wallet balances are derived, never authoritative.
  4. Defence in depth for security: network, application, data, identity, secret, and audit layers each enforce policy independently. PII and money never leave Ethiopia (data residency).
  5. Operational discipline: blue-green deploys, expand-contract DB migrations, multi-AZ active-active, OpenTelemetry-instrumented, SLO- driven, with runbooks for every critical path.

We do not rewrite the monolith in one go. We use the strangler fig pattern — new domains are extracted one by one, behind a stable API gateway, with feature flags and shadow traffic. The monolith is deprecated last, not first.


2. Glossary

TermDefinition
Core banking system (CBS)The transactional engine inside a bank that maintains accounts and posts movements. Common products: Temenos T24/Transact, Oracle Flexcube, Infosys Finacle, Path Solutions iMAL.
ISO 20022International messaging standard for financial transactions. Replacing SWIFT MT formats globally.
ISO 8583Older message standard, still used for card/POS transactions.
PSPPayment Service Provider (Telebirr, M-Birr, HelloCash, Amole, CBE Birr, etc.).
PSD2 / Open BankingRegulatory frameworks (EU, UK) requiring banks to expose customer-permissioned APIs. NBE has signalled similar in its 2025 directive.
NBENational Bank of Ethiopia — the regulator.
FaydaEthiopia's national digital ID (FCN/National ID Number).
KYCKnow Your Customer — the identity-verification process.
AML / CFTAnti-Money Laundering / Countering the Financing of Terrorism.
CDD / EDDCustomer Due Diligence (normal) / Enhanced Due Diligence (high-risk).
HSMHardware Security Module — tamper-resistant key store.
mTLSMutual TLS — both client and server present certificates.
SagaA distributed transaction pattern using compensating actions.
OutboxA reliable-events pattern using a DB table as a durable queue.
Strangler figMigration pattern: route piece by piece away from a monolith until the monolith is gone.
SLO / SLIService Level Objective / Indicator — measurable reliability targets.
RPO / RTORecovery Point Objective (acceptable data loss) / Recovery Time Objective (acceptable downtime).

3. Current state — what we already have

3.1 Code

  • Monorepo (Nx 22, pnpm) with one NestJS server and five Next.js frontend apps (admin, business, client, fi, bnpl-partner) plus a Docusaurus docs site.
  • Server modules wired in AppModule: auth, business, employee, prisma. That's it — no payment, ledger, lending, BNPL, EWA or equb modules are implemented yet despite being modelled in Prisma.
  • Single docker-compose.yml with one Postgres instance for the whole monorepo.
  • No event bus, message broker, cache, secrets manager, observability stack, or HSM provisioned.

3.2 Prisma schema (already strong)

The schema is more advanced than the implementation. It already models:

  • Identity: User, Role (RBAC), AdminProfile, UserRole enum, 2FA fields, AuditLog.
  • Business: Business, Department, Employee, EmployeeAbsence.
  • Double-entry ledger: LedgerAccount, Transaction, JournalEntry (with debit/credit, account types ASSET/LIABILITY/ EQUITY/REVENUE/EXPENSE).
  • Payroll: Payroll, PayrollEntry, PayrollStatus.
  • Lending: BusinessLoan, EmployeeLoan, LoanPayment, LoanStatus.
  • Embedded finance: EarlyWageAccess, BNPLPurchase, BNPLPayment, Equb, EqubMember, EqubPayout.
  • Money: Wallet, WalletTransaction, WithdrawalRequest, BillPayment, Expense, SavingGoal.
  • Counterparties: FinancialInstitution, BNPLPartner, Merchant.
  • Settlement: SettlementBatch, SettlementRecord, SettlementType, SettlementStatus.
  • Shared enums: KYCStatus, PaymentMethod, WithdrawalMethod.

3.3 Gaps

CapabilityModelled?Implemented?
RBACpartial
Bank/wallet adapters
KYC orchestrationenum only
Ledger posting service
Settlement engine
Reconciliation jobs
Event bus
Idempotency layer
Audit immutability (WORM)log model existsnot WORM
Secrets managementenv filesenv files
HSM / KMS
Multi-region / multi-AZ
Observability stack

Conclusion: the Prisma schema is a solid blueprint. Implementation is at "zero" for the money paths. This means we don't need to migrate data heavy — we mostly need to build it right the first time while the monolith covers identity and business administration.


4. How core banking actually works in Ethiopia

To integrate "with many banks and many financial institutions", we need a realistic view of what's on the other side of the wire.

4.1 The technology landscape

ProviderCore banking systemTypical integration surface
CBETemenos T24 + customSOAP / file dropbox / SFTP batches; emerging REST APIs through CBE Birr
Awash, Dashen, AbyssiniaOracle FlexcubeSOAP web services; sometimes REST gateways through fintech sandboxes
Wegagen, Hibret, NIBPath Solutions iMALSFTP files; some REST via partners
Cooperative Bank of Oromia, ZemenInfosys FinacleSOAP; ISO 8583 for card rails
Microfinance (Omo, ACSI, AdCSI)Various, sometimes Excel + manualOften portal upload + manual reconciliation
TelebirrEthio Telecom — Huawei mobile moneyREST API (HMAC + RSA signing); webhook callbacks
M-Birr, HelloCash, Amole, CBE BirrVariousREST APIs; HMAC + JWT or RSA; mobile money rails
Card networks (where used)Visa / Mastercard via card processorsISO 8583 over MPLS; not your usual REST

4.2 The patterns you actually encounter

  1. Synchronous REST/JSON — newest wallet providers and a few "open banking" sandboxes. Easiest, but throughput and uptime vary wildly.
  2. SOAP / XML — legacy core banking. Strong typing via WSDL, but you'll fight with date/timezone formats and exception channels.
  3. SFTP batch files — daily/intra-day file drops (often pipe- delimited or fixed-width). Settlement reports, bulk credit files, reversal files. Many bank disbursement rails still work this way in Ethiopia.
  4. ISO 8583 / ISO 20022 — card/Switch integration (ETHSwitch), high-value transfers (RTGS). You don't speak this directly; you integrate through a processor or the bank's API gateway.
  5. Webhook callbacks — providers push you success/failure updates asynchronously. Always treat as untrusted until verified.

4.3 What you need at the application boundary

For every provider, regardless of the protocol underneath, you need:

  • Authentication — usually OAuth2 client credentials, HMAC-signed requests, or mutual TLS. Often combined.
  • Idempotency — every payment send must carry a unique idempotency key that the provider honours. If they don't honour it, we guarantee it on our side by enforcing single-execution semantics on the adapter.
  • Webhook signing — HMAC over the body with a shared secret, or detached JWS. Reject anything without a valid signature.
  • Reversal & refund semantics — different per provider. Some support reversals within N hours; some require manual intervention.
  • Settlement reports — daily files or API endpoints listing everything they consider final. We must reconcile our books to these reports daily before opening for new business each day.

4.4 Operational realities

  • Bank APIs are not 24/7. Many sandbox/production endpoints are available only during banking hours, with downtime windows for batch jobs.
  • Webhooks are not reliable. They may not arrive, may arrive twice, may arrive out of order. Always reconcile.
  • "Same-day settlement" usually means T+1 — same business day if initiated before cut-off, otherwise next business day.
  • Limits and FX rules change by directive; we must build a config- driven limits engine, not hard-code thresholds.

5. Target architecture — overview

5.1 Logical view

┌────────────────────────────────────────┐
│ Edge / WAF │
│ Cloudflare-style: DDoS, bot, rate │
└──────────────────┬─────────────────────┘
│ TLS 1.3
┌──────────────────▼─────────────────────┐
│ API Gateway (Kong/Apollo) │
│ AuthN (JWT), AuthZ (OPA), Rate-limit │
└──────────────────┬─────────────────────┘
│ mTLS · internal
┌──────────────────────────────┼──────────────────────────────┐
│ │ │
┌──────▼──────┐ ┌────────────┐ ┌────▼─────┐ ┌────────────┐ ┌────▼──────┐
│ Identity │ │ Business │ │ Payments │ │ Lending │ │ Embed. │
│ & KYC │ │ Tenancy │ │ Orches. │ │ (Loans, │ │ Finance │
│ │ │ │ │ │ │ EWA) │ │ (BNPL, │
│ │ │ │ │ │ │ │ │ Equb) │
└────┬────────┘ └─────┬──────┘ └────┬─────┘ └─────┬──────┘ └────┬──────┘
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Ledger (DBL- │◄──────┴──────────────┘
│ │ │ entry, SoR) │
│ │ └───────┬────────┘
│ │ │
│ ▼ ▼
│ ┌────────────┐ ┌──────────────────┐
│ │ Payroll │ │ Banking-Adapter │
│ │ Engine │ │ Hub (per-prov.) │
│ └────────────┘ └────────┬─────────┘
│ │
│ ├── CBE adapter
│ ├── Telebirr adapter
│ ├── Awash adapter
│ ├── M-Birr adapter
│ └── ...n more


┌─────────────────────────────────────────────────────────────────────┐
│ Cross-cutting services (each its own service, called via events): │
│ Notification · Audit-Log · Reconciliation · Fraud · Compliance · │
│ Reporting · Document-Store · Webhook-Receiver │
└─────────────────────────────────────────────────────────────────────┘

5.2 Communication patterns

PatternWhenWhy
REST/JSON via gatewayAll client ↔ backend trafficStable contract, tooled
gRPC + mTLSService ↔ service synchronousStrong typing, low overhead, mTLS by default
Async events (Kafka or NATS JetStream)Anything that crosses domain boundaries (e.g. "PayrollApproved")Decoupling, replay, durability
Outbox table → CDC → busWhenever a service emits an event after a DB writeAtomicity between state and event
Saga (orchestration)Multi-service workflows (loan disbursal)Compensating actions for partial failure
Scheduled jobs (cron/temporal)Daily settlements, EOD reconciliationPredictable batches
SFTP / file watchersLegacy bank railsReality of Ethiopian banking

5.3 Data ownership

  • Each microservice owns its own Postgres schema, not its own database instance (cost-effective at our stage). Logical separation enforced with per-service DB users and row-level security.
  • No service reads another service's tables. Only via API or event subscription.
  • Common reference data (countries, currencies, business calendars) lives in a small reference-data service and is cached locally.

5.4 The ledger is the system of record

This is non-negotiable. Wallet balances, loan outstanding balances, EWA available balances — all are derived from the double-entry ledger. We never increment a balance directly. We post a journal entry; the read-side projects balances from journals.

This means:

  • Bug in projection? Replay from journals.
  • Audit asks "why is this number what it is"? We show the journals.
  • Reconciliation breaks? We diff our journals against the bank's file.

6. Microservice catalogue

Each entry lists the service's purpose, the Prisma models it owns, its primary public API surface, the events it publishes, and its direct dependencies.

6.1 Identity & KYC service (svc-identity)

  • Purpose: User registration, authentication, MFA, RBAC, KYC orchestration, sanctions screening, customer profile.
  • Owns: User, Role, AdminProfile, KYC documents, KYC decisions, 2FA secrets, session tokens.
  • Public API: POST /users, POST /sessions, POST /kyc/submit, GET /kyc/status, POST /mfa/verify.
  • Publishes: UserRegistered, KYCSubmitted, KYCApproved, KYCRejected, UserDeactivated.
  • Dependencies: Document-store (for ID images), Sanctions (Dow Jones / World-Check or local list), Fayda API (national ID verification).
  • External calls: SMS gateway, email, Fayda, sanctions list.

6.2 Business tenancy service (svc-business)

  • Purpose: Business onboarding, departments, employees, business KYC (KYB), commercial agreements, fee schedules.
  • Owns: Business, Department, Employee, EmployeeAbsence.
  • Public API: POST /businesses, POST /departments, POST /employees, GET /businesses/:id/employees.
  • Publishes: BusinessOnboarded, EmployeeJoined, EmployeeOffboarded.
  • Dependencies: Identity (to link employees to users), Document- store.

6.3 Ledger service (svc-ledger)

  • Purpose: Authoritative double-entry ledger; the system of record for all value movement.
  • Owns: LedgerAccount, Transaction, JournalEntry.
  • Public API: POST /transactions (with full journal entries — must balance), GET /accounts/:id/balance, GET /transactions/ :id, POST /transactions/:id/reverse.
  • Publishes: TransactionPosted, TransactionReversed, BalanceUpdated.
  • Invariants: every transaction's debit total = credit total. Rejects unbalanced posts at the DB layer.
  • No external dependencies.

6.4 Payments orchestration service (svc-payments)

  • Purpose: Single internal "money API". Routes outbound payments to the right adapter; receives webhook callbacks; updates the ledger; enforces idempotency.
  • Owns: Payment intents, payment attempts, idempotency keys, webhook event log, routing rules.
  • Public API: POST /payments/intents (with idempotency key), GET /payments/:id, POST /payments/:id/cancel.
  • Publishes: PaymentInitiated, PaymentSucceeded, PaymentFailed, PaymentReversed.
  • Dependencies: Banking-adapter hub, Ledger, Risk (fraud), Audit.

6.5 Banking-adapter hub (svc-bank-adapters)

  • Purpose: One adapter per provider; uniform internal contract.
  • Owns: Provider configs, encrypted credentials, routing rules, per-provider rate limits, circuit breaker state.
  • Adapters (one per provider):
    • cbe-adapter — CBE Birr REST, SFTP for bulk.
    • telebirr-adapter — Telebirr REST + webhooks.
    • awash-adapter, dashen-adapter, abyssinia-adapter, cooperative-adapter, zemen-adapter, nib-adapter, …
    • mbirr-adapter, hellocash-adapter, amole-adapter, …
    • sftp-adapter — generic SFTP batch sender for legacy rails.
  • Common adapter interface (gRPC contract):
    Send(idempotencyKey, amount, currency, source, dest, ref) → AttemptId
    GetStatus(AttemptId) → Status
    Refund(AttemptId, amount, ref) → AttemptId
    WebhookHandler(rawBody, signature) → InternalEvent
    ListSettlement(fromDate, toDate) → SettlementBatch
  • Publishes: ProviderPaymentSucceeded, ProviderPaymentFailed, ProviderSettlementReady.
  • Dependencies: Secrets vault, Network egress to provider endpoints.

6.6 Settlement & reconciliation service (svc-settlement)

  • Purpose: End-of-day batches, reconciliation, dispute resolution workflow.
  • Owns: SettlementBatch, SettlementRecord, reconciliation diff reports, disputes.
  • Public API: POST /settlements/run, GET /settlements/:id, GET /reconciliation/diff?date=….
  • Publishes: SettlementCompleted, ReconciliationBreakFound.
  • Dependencies: Banking-adapter hub, Ledger, Notifications (when break found, alert finance).

6.7 Payroll service (svc-payroll)

  • Purpose: Payroll cycle management, calculations (PAYE, pension, withholdings), payslip generation, approval workflow, disbursement orchestration.
  • Owns: Payroll, PayrollEntry.
  • Public API: POST /payrolls, POST /payrolls/:id/approve, POST /payrolls/:id/disburse, GET /payrolls/:id/payslips.
  • Publishes: PayrollDrafted, PayrollApproved, PayrollDisbursed.
  • Dependencies: Business, Identity, Payments, Ledger, Tax-engine (could be embedded initially).

6.8 Lending service (svc-lending)

  • Purpose: Business and employee loans, applications, underwriting decisions, repayment schedules.
  • Owns: BusinessLoan, EmployeeLoan, LoanPayment.
  • Public API: POST /loans/apply, POST /loans/:id/disburse, POST /loans/:id/repay.
  • Publishes: LoanApplied, LoanApproved, LoanDisbursed, LoanRepayment, LoanWrittenOff.
  • Dependencies: Identity (KYC), Business, Payments (for disbursement and repayment), Ledger, Risk (for underwriting).

6.9 Embedded finance service (svc-embedded-finance)

  • Purpose: Early Wage Access, BNPL, Equb, Bill Payments — features that ride the same money rails but have their own business logic.
  • Owns: EarlyWageAccess, BNPLPurchase, BNPLPayment, Merchant, Equb, EqubMember, EqubPayout, BillPayment.
  • Public API: POST /ewa/request, POST /bnpl/purchases, POST /equbs, POST /bills/pay.
  • Publishes: EWAGranted, BNPLApproved, EqubCycleClosed.
  • Dependencies: Identity, Lending (for credit decisions), Payments, Ledger, Risk.

6.10 Risk & fraud service (svc-risk)

  • Purpose: Real-time fraud scoring, velocity limits, sanctions re-screening on every transaction, AML pattern detection.
  • Owns: Rules engine, fraud scores, decision logs, suspicious activity reports (SAR).
  • Public API: POST /risk/evaluate (returns ALLOW / REVIEW / BLOCK in <100 ms).
  • Publishes: HighRiskTransactionDetected, SARFiled.
  • Dependencies: Identity (KYC tier), Ledger (transaction history), Sanctions list.

6.11 Notification service (svc-notify)

  • Purpose: Multi-channel notifications (SMS, email, push, in- app). Templating, localisation (Amharic / English), rate-limiting, preference enforcement.
  • Owns: Templates, delivery logs, user preferences.
  • Public API: POST /notifications/send (event-driven, mostly).
  • Subscribes to: every event that triggers a user-facing message.

6.12 Audit-log service (svc-audit)

  • Purpose: WORM (write-once-read-many) audit log of every privileged action and every money movement. Append-only, integrity- hashed.
  • Owns: AuditLog (extended with hash chain).
  • Public API: POST /audit/events (internal only), GET /audit/events (regulators, internal review).
  • Storage: Hot in Postgres, warm in S3 (immutable bucket with object lock), hash-chained per business per day.
  • Subscribes to: every domain event.

6.13 Document-store service (svc-documents)

  • Purpose: Encrypted storage for KYC documents, contracts, payslips. Pre-signed download URLs.
  • Owns: Metadata index; binaries in encrypted object storage (S3-compatible, server-side encryption with KMS keys).
  • Public API: POST /documents (multipart, virus-scanned), GET /documents/:id/url (short-lived signed URL).

6.14 Webhook receiver (svc-webhooks)

  • Purpose: Public endpoint for provider callbacks. Validates signatures, drops invalid, persists raw payload, hands off to Payments via internal event.
  • Owns: Raw webhook event log (immutable).
  • Public API: POST /webhooks/:provider (one path per provider).

6.15 Reporting / read-models (svc-reports)

  • Purpose: Subscribes to all domain events, builds projected read models for dashboards (admin / business / FI / BNPL-partner / client). No business logic; just denormalised views.

6.16 Reference-data service (svc-reference)

  • Purpose: Banks list, branches, BIC codes, currencies, business calendars, holidays, fee schedules, tax tables, regulatory limits. Cached aggressively. Read-mostly.

7. Bank / FI / Wallet integration layer

7.1 The adapter pattern

We define one internal contract:

interface PaymentAdapter {
/** Initiate an outbound payment */
send(input: {
idempotencyKey: string;
amount: Decimal;
currency: 'ETB';
source: { // We are paying from…
accountNumber: string;
accountName?: string;
};
destination: { // We are paying to…
kind: 'BANK_ACCOUNT' | 'WALLET';
identifier: string; // account # or wallet phone
accountName?: string;
bankCode?: string; // for BANK_ACCOUNT
};
reference: string; // shows on the customer's statement
metadata: Record<string, string>;
}): Promise<{ attemptId: string; status: AttemptStatus }>;

/** Query status */
getStatus(attemptId: string): Promise<AttemptStatus>;

/** Refund / reversal */
refund(attemptId: string, amount: Decimal, reference: string)
: Promise<{ refundId: string; status: AttemptStatus }>;

/** Parse and validate an incoming webhook */
parseWebhook(rawBody: Buffer, headers: Record<string,string>)
: Promise<NormalisedProviderEvent>;

/** Pull settlement data (daily) */
pullSettlement(date: Date): Promise<NormalisedSettlement>;

/** Provider-specific health check */
health(): Promise<HealthResult>;
}

Every adapter implements this. The Payments service never knows the difference between Telebirr and CBE.

7.2 Adapter responsibilities

  • Auth handshake (OAuth / HMAC / mTLS) → handled per adapter.
  • Request shaping → translate normalised → provider format.
  • Response normalisation → translate provider → internal.
  • Idempotency → the adapter enforces single-execution even if the provider doesn't.
  • Circuit breaker, retries with exponential back-off + jitter, bulkhead pool isolated per provider.
  • Provider-specific error mapping to a small internal taxonomy.

7.3 Onboarding a new bank (the cookbook)

Estimate per new bank: 2–4 engineer-weeks depending on protocol.

  1. Legal & commercial — sign integration agreement, get sandbox credentials, agree on cut-off times and limits.
  2. Connectivity — provision IPs to allow-list, set up mTLS or API key vault entry. Confirm latency from our infra.
  3. Implement adapter behind the common interface.
  4. Contract tests in CI against a recorded sandbox.
  5. Shadow mode — for two weeks, route a copy of real payment intents through the new adapter without actually sending, and diff the result with the live adapter.
  6. Canary — route 1% of eligible traffic; expand to 10%, 50%, 100% over a week with manual gating between steps.
  7. Settlement reconciliation — run for ≥ 14 days with zero unresolved breaks before declaring GA.
  8. Runbook — every adapter ships with: cut-off times, contact numbers, common error mappings, rollback procedure.

7.4 Routing rules (when there are multiple providers)

The Payments service picks the adapter at runtime based on:

  • Destination kind: wallet number → wallet rail; bank account → bank rail.
  • Bank code or wallet operator prefix (extracted from account number / phone number).
  • Health & capacity: prefer healthy adapters; downgrade to fall- back when one is degraded.
  • Cost tiering: cheapest acceptable rail for the amount and speed required.
  • Cut-off times: if a same-day rail is past its cut-off, fall back to a 24/7 rail (wallet) or queue for next business day.
  • Customer preference: if the user has pinned a provider.

Routing is config-driven (Prisma JSON on FinancialInstitution and a new Provider table), reviewable in the admin portal, with a full audit history of changes.


8. User identity & KYC

8.1 Registration — what we collect

We use a tiered KYC model — common in mobile money — so that small wallets can open instantly while higher-value services require more verification.

TierLimits (illustrative — confirm with NBE)Required data
Tier 0 — Anonymous browseNo money operationsPhone number, email, password, accepted T&C
Tier 1 — Light walletDaily ≤ 5,000 ETB, monthly ≤ 30,000 ETB+ Full legal name, DOB, gender, Fayda ID, selfie liveness
Tier 2 — StandardDaily ≤ 30,000, monthly ≤ 300,000+ Address, occupation, source of funds, Fayda biometric match
Tier 3 — BusinessPer agreement+ Business registration, TIN, beneficial owners, board resolution, signed mandates
Tier 3 — High valueHigher limits+ Enhanced Due Diligence (EDD), PEP screening, manual review

8.2 Data captured at user registration

Minimum field set in User (extend the schema accordingly):

  • id (cuid)
  • phone (E.164, unique, primary identifier in Ethiopia)
  • email (unique, optional at Tier 0, required at Tier 1+)
  • password (Argon2id hash, not bcrypt; pepper from KMS)
  • firstName, middleName, lastName (Tier 1+)
  • dateOfBirth, gender (Tier 1+)
  • nationalId (Fayda FCN, unique, encrypted at rest)
  • idDocumentType (e.g. PASSPORT, KEBELE_ID — fallback if no Fayda)
  • idDocumentRef (document service ID)
  • addressLine1, region, subCity, kebele (Tier 2+)
  • occupation, sourceOfFunds, expectedMonthlyVolume (Tier 2+)
  • kycStatus (PENDING → IN_REVIEW → VERIFIED / REJECTED)
  • kycTier (0/1/2/3)
  • kycSubmittedAt, kycDecisionAt, kycDecisionBy
  • riskRating (LOW / MEDIUM / HIGH — drives EDD)
  • pepFlag (Politically Exposed Person)
  • sanctionsMatchAt (last screen result timestamp)
  • consents (JSON: T&C version, privacy policy version, marketing opt-in, with timestamps and IP)
  • createdAt, updatedAt, deletedAt, lastLoginAt, failedLoginCount
  • Soft-delete (deletedAt) not hard delete — regulators require retention.

For businesses (KYB), capture:

  • Legal name, trade name, business type, TIN, VAT number, registration number, registration date, registering authority.
  • Industry classification (NBE category).
  • Registered address; physical operating address.
  • Directors / authorised representatives — each with their own user record and Tier 2+ KYC.
  • Beneficial owners (≥ 25% ownership) — each individually verified.
  • Bank account(s) for settlement.
  • Board resolution / mandate authorising platform use.

8.3 Registration flow

Phone +
Email + ──► OTP sent ──► OTP verified ──► Password set ──► T&C
Password to phone to phone Argon2id consent
Tier 0 wallet created (no funds)


Identity service
publishes UserRegistered

Tier 1 upgrade:
Name, DOB, gender, Fayda
Selfie + liveness (server-side check)
Fayda biometric match (svc-identity → Fayda API)
Sanctions screen
Risk score
──► auto-VERIFY (low risk + clean Fayda) or QUEUE for review

Tier 2 upgrade:
Address proof, occupation, source-of-funds declaration
Re-screen + risk re-score
Manual or rule-based decision

Business (Tier 3):
KYB workflow — admin review mandatory, no auto-approve.

8.4 KYC controls

  • Document collection through svc-documents. Server-side virus scan (ClamAV in a side-car), file-type sniff (not just extension), EXIF strip, size cap (25 MB).
  • Liveness — challenge-response with random head movements, server-side analysis. Cap at 3 attempts per 24 hours.
  • Sanctions screening — daily re-screen of every active user against the platform's curated list (NBE list + UN consolidated + OFAC if relevant).
  • PEP screening — at onboarding and on update.
  • Adverse media — at Tier 3 only.
  • Re-KYC — driven by changes in risk rating, transaction patterns, age of last verification (every 12 months for HIGH risk, 36 months for LOW).
  • Right to be forgotten — implemented as cryptographic shredding of PII keys (per-user KMS key) plus tombstoning the record. The ledger entries stay (regulatory retention) but are pseudonymised.

8.5 Authentication & session

  • First factor: phone + password (Argon2id, 64 MB / 3 iterations / parallelism 4).
  • Second factor: TOTP (always available) + WebAuthn (preferred on supported devices). SMS OTP allowed as fallback but rate- limited and never the only factor for sensitive actions.
  • Sessions: short-lived access tokens (15 min) + rotating refresh tokens (7 days, single-use, refresh-token reuse triggers global session invalidation).
  • Step-up auth required for: changing destination accounts, initiating payments > ETB 50k, adding a new beneficiary, disabling 2FA.
  • Device binding: each device gets a per-device JWK fingerprint, bound to the refresh token.

9. Security architecture (defence-in-depth)

Defence in depth means every layer enforces policy independently. A breach of one does not give the attacker the next.

9.1 Network layer

  • Cloudflare-class WAF + DDoS at the edge. Custom rules for known attack patterns; bot management; geo-fencing for admin endpoints.
  • Public ingress only into the API gateway and the webhook receiver. No service is reachable from the internet directly.
  • Private VPC. Each service in its own subnet. Egress through NAT with a deny-by-default allow-list.
  • mTLS between every service. Certificates issued by an internal CA with 24-hour rotations; SPIFFE/SPIRE for identity.
  • Egress allow-list for outbound calls to provider endpoints; logged and alerted on unknown egress attempts.

9.2 Application layer

  • Input validation at the gateway via JSON schema + business validators in each service (Zod or class-validator).
  • Output encoding. No HTML in JSON responses. CSP, HSTS, COOP/COEP headers set at the gateway.
  • No raw SQL anywhere. Prisma everywhere; reviewed for type- unsafe extensions.
  • Dependency scanning (Snyk / Dependabot) in CI. CVE deadline: Critical = 48 h, High = 7 d, Medium = 30 d.
  • SAST (Semgrep + ESLint security plugins) in CI.
  • SCA (Software Composition Analysis) on every build.
  • Container scanning (Trivy) before image promotion.

9.3 Data layer

  • Encryption at rest for all data stores. Postgres with TDE- equivalent (filesystem-level + per-column encryption for PII). Object storage encrypted with KMS.
  • Encryption in transit — TLS 1.3 only on all hops. PSP-level TLS pinning where the provider supports it.
  • Tokenisation of bank account numbers and Fayda IDs. The raw value is stored only in the Identity / Documents service; every other service references a token.
  • Per-tenant key isolation — each business has its own data- encryption key, derived from a master key in KMS.
  • Backups — encrypted, off-region, tested monthly via restore drills.
  • No PII in logs. Structured logging with a redaction layer that drops fields tagged @sensitive.

9.4 Identity & secret layer

  • All secrets in HashiCorp Vault (or AWS Secrets Manager). Apps receive short-lived dynamic credentials at boot, refreshed every hour.
  • Per-service service-account; JIT issuance.
  • HSM-backed signing keys for: JWT signing, webhook signature verification, payment-message signing.
  • Break-glass procedure with two-person rule on critical secrets.
  • Workforce SSO via OIDC for the admin portal; SCIM-managed provisioning; MFA required.

9.5 Audit layer

  • WORM audit log: every privileged action (transfer, KYC decision, role change, secret access) emits an event to svc- audit. Stored in Postgres + replicated to S3 with Object Lock (governance retention 7 years, regulatory retention).
  • Hash chain: each daily batch of audit events is Merkle-rooted; the root is published to an internal append-only log signed with the HSM. Tampering is detectable retroactively.
  • SIEM: stream audit + security events to a SIEM (Wazuh or Datadog Security Monitoring) with playbooks for: unusual login velocity, privileged-action anomalies, payment-routing changes, webhook signature failures.

9.6 Operational layer

  • All admin actions in production are broken-glass, time-boxed (< 60 min), require ticket reference, and produce a recorded session.
  • Code that touches money requires two-person review on PR.
  • Production data access is forbidden by default; granted JIT for a specific ticket, capped at 2 hours, audited.
  • Backups tested via restore drills monthly; failover drills quarterly.

10. Data consistency, ledger & idempotency

10.1 The double-entry rule

Every value movement is one Transaction with two or more JournalEntry rows whose debits == credits. Examples:

Payroll disbursement of 10,000 ETB to employee E1:

DR Payroll Expense 10,000
CR Cash / Bank Settlement Account 10,000

Wallet credit on employee side:

DR Cash / Bank Settlement Account 10,000
CR Employee Wallet (E1) 10,000

Wallet balance = sum of debits − sum of credits on the wallet account. Read from the projection; reconcile to journals nightly.

10.2 Idempotency

Every state-changing public API requires Idempotency-Key (client-generated UUID, 24-hour TTL). The gateway stores the first response keyed by (client_id, idempotency_key) and returns the same response on retries.

Internal services apply the same rule on their own gRPC interface using a request-ID header.

10.3 Outbox pattern

When a service writes to its DB and must emit an event, it writes the event to an outbox table in the same transaction. A small relay process (Debezium or a custom worker) ships outbox rows to the bus and marks them sent. This guarantees at-least-once delivery without distributed transactions.

Consumers are designed idempotent (event-ID dedupe table) so "at-least-once" is safe.

10.4 Saga (orchestration) for cross-service workflows

Example: Employee loan disbursement.

1. svc-lending : create EmployeeLoan (PENDING)
2. svc-risk : evaluate → APPROVE
3. svc-ledger : reserve funds (post pending journal)
4. svc-payments : initiate payment via adapter
5. (callback) : svc-payments updates → SUCCESS
6. svc-ledger : finalise journal (move from pending to posted)
7. svc-lending : mark loan DISBURSED, schedule repayments

If step 4 fails: orchestrator triggers compensating actions (release ledger reservation, mark loan as FAILED, notify customer).

Implementation: Temporal (preferred) or a NestJS saga library. Temporal gives durable execution, retry, and replay for free.

10.5 Reconciliation

Every morning, svc-settlement pulls each provider's daily settlement report, joins to our ledger journals, and produces a diff report:

  • Matched (✓)
  • Provider has, we don't → investigate (possible missed webhook)
  • We have, provider doesn't → investigate (possible double-spend)
  • Amount mismatch → investigate

Unresolved breaks block opening for new business in that provider channel until cleared. Disputes follow a manual workflow with a tracked audit trail.


11. High availability & zero-downtime deployments

11.1 Availability targets (SLOs)

CapabilityAvailability targetLatency p99RPORTO
Identity / auth99.95%300 ms1 min5 min
Payments init99.95%500 ms0 (durable)5 min
Webhook receipt99.99%200 ms0 (durable)1 min
Admin portal99.9%1 s5 min30 min
Reporting99% (read-only)2 s1 hour4 hours

We don't promise more than the underlying bank rails can deliver on the customer journey. The bank rail's downtime is communicated in-product.

11.2 Topology

  • Multi-AZ active-active in the primary region (Addis or nearest cloud region with NBE-acceptable data residency).
  • Postgres: synchronous replication to one in-region standby + async to a DR region. Connection pooling via PgBouncer (transaction pooling) at each service.
  • Redis: clustered, multi-AZ, primary + replicas; used only for caches and rate-limit counters — never as a source of truth.
  • Kafka/NATS: 3-node cluster across AZs, replication factor 3, min- in-sync replicas 2.
  • Object storage: multi-AZ by default; cross-region replication for audit / documents.

11.3 Deploy strategy

  • Blue-green for stateless services (gateway, all microservices). Cut over via the gateway after the new colour passes synthetic checks.
  • Canary for risky changes: 1% → 10% → 50% → 100% with auto- rollback on SLO burn.
  • Feature flags (Unleash or OpenFeature) on every new payment flow, so production exposure can be reduced to zero in seconds.

11.4 Zero-downtime database migrations

Always expand → migrate → contract, never break-and-replace:

  1. Expand — add new columns/tables, nullable, optional. Deploy.
  2. Backfill — populate new columns in chunks with throttling.
  3. Migrate — switch reads/writes to new columns behind a flag.
  4. Validate in production with shadow reads.
  5. Contract — remove old columns in a later release.

Prisma migrations are committed; every migration reviewed for locking behaviour (long lock = forbidden). Tools: pg_repack for table rewrites, pgroll for safer multi-step migrations.

11.5 Graceful shutdown and back-pressure

  • All services handle SIGTERM: stop accepting new requests, drain in-flight (capped at 30 s), then exit.
  • Bulkheads — each downstream provider has its own connection pool, so a slow provider can't starve the rest.
  • Circuit breakers (resilience4j-equivalent) per adapter, per endpoint. Open at 50% errors over 60 s, half-open after 30 s.

11.6 Disaster recovery

  • DR region with async replication.
  • Failover tested every quarter (game-day).
  • Backups (PITR Postgres) every 5 min, retained 30 days hot + 7 years cold.
  • Documented runbook for every "what if X fails" — region, AZ, Postgres, Kafka, a provider, the gateway.

12. Observability & operations

12.1 The three signals

  • Logs: structured JSON, correlation ID per request (W3C trace- parent). Loki or OpenSearch. PII-redacted.
  • Metrics: Prometheus, scraped by service-mesh. Default RED metrics (Rate / Errors / Duration) per endpoint, USE metrics per resource. Custom business metrics: payments initiated, payments succeeded, settlement matched %, KYC decisions per hour.
  • Traces: OpenTelemetry, sampled at 10% normally, 100% for payments. Backend: Tempo / Honeycomb.

12.2 SLO-driven on-call

  • Every critical service publishes its SLO (availability + latency) to a dashboard. Burn-rate alerts page on-call when error budget burns > 14× normal.
  • Pages are rare and respected. If a page is wrong, the postmortem fixes the alert.
  • Runbooks linked in every alert.

12.3 Synthetic monitoring

  • Probe the full payment path (init → adapter sandbox → callback → ledger update) every 5 minutes from outside.
  • Probe each adapter's health() every minute; degrade routing automatically on failure.

12.4 Blameless postmortems

For every customer-impacting incident: written within 5 business days, blameless, ships with at least one action item per root cause. Reviewed in engineering all-hands monthly.


13. Regulation & compliance

We are a fintech in Ethiopia handling money on behalf of others. This is a regulated environment.

13.1 NBE (National Bank of Ethiopia)

  • Hold the relevant licence for the service category (Payment System Operator, Payment Instrument Issuer, etc. — confirm with legal).
  • File the directive-required reports: transaction volumes, KYC metrics, suspicious activity reports.
  • Data residency: customer data and transactional records remain inside Ethiopia unless an explicit exception is granted. Plan infra accordingly (in-country DC or NBE-approved cloud).
  • Implement and test BCP (Business Continuity Plan) per NBE's expectations.

13.2 PCI DSS

  • Scope: if any rail involves a card (PAN), even indirectly, PCI applies. Strongly prefer routing all card interactions through a PCI-certified processor so we are out-of-scope or SAQ-A scope.
  • If in scope: tokenise PANs at the perimeter; never store PAN in our DBs; segment cardholder data environment (CDE) with separate VPC and stricter controls.

13.3 AML / CFT

  • Documented AML programme with a designated officer.
  • KYC tiers, transaction monitoring rules, SAR filing workflow.
  • Customer Risk Rating updated continuously; EDD for HIGH.
  • Sanctions screening: NBE list + UN consolidated + OFAC where applicable. Daily re-screen of the customer base.
  • Record retention: 7 years post-relationship.

13.4 Privacy

  • Lawful basis for processing — contractual necessity (payments) + legal obligation (KYC) + legitimate interest (fraud).
  • Subject access requests served within 30 days.
  • Privacy by design and default — KYC documents have a default retention of 7 years post-closure, automatically purged.

13.5 Information security management

  • Target ISO/IEC 27001 certification within 18 months of launch.
  • Document and operate to an ISMS — risk register, asset register, access control policy, change management policy, incident response policy, vendor management policy.

14. Migration plan (strangler fig from monolith)

We do not rewrite. We extract.

14.1 Phase 0 — Foundation (4–6 weeks)

Goal: nothing visible to users yet, but the floor is solid.

  • Provision: VPC, multi-AZ Kubernetes (or Nomad), Postgres HA, Kafka/NATS, Redis, Vault, S3 (or Wasabi), KMS.
  • Set up CI/CD with: lint, test, SAST, SCA, container scan, signed images (cosign), promotion gates per environment.
  • Bring up observability stack: Prom / Grafana / Loki / Tempo / alerting.
  • Add idempotency-key middleware to the existing monolith on every state-changing endpoint.
  • Add an audit_log outbox + relay in the monolith.
  • Add outbox and inbox tables to the monolith and emit domain events for the events we know we'll need.

14.2 Phase 1 — Identity & KYC (4–6 weeks)

Goal: identity is owned by a new service; the monolith calls it.

  • Extract svc-identity with its own Postgres schema; replicate initial user data; switch monolith to call it for auth.
  • Add the KYC tier model, document store integration, Fayda hook, sanctions screening.
  • Migrate sessions / refresh tokens to the new service.
  • Cut over with feature-flag, dual-write during a parallel window, then point all auth traffic to svc-identity.

14.3 Phase 2 — Ledger + Payments (8–10 weeks)

Goal: a single internal money API exists.

  • Extract svc-ledger. The schema is already there; just stand it up and write the journal-posting service.
  • Build svc-payments with idempotency, intent → attempt model, webhook ingestion.
  • Build the adapter framework + two adapters first: Telebirr + CBE (the highest-volume rails).
  • Build svc-webhooks as the public-facing receiver.
  • Migrate any existing money flow in the monolith to call svc- payments. Most flows aren't yet implemented, which makes this easier than usual.

14.4 Phase 3 — Settlement & reconciliation (4 weeks)

  • Build svc-settlement with daily batch + diff reporting.
  • Operate it in shadow mode (no auto-settle) for 2 weeks; then enable auto-settle once breaks are < 0.05%.

14.5 Phase 4 — Lending, EWA, BNPL, Equb (parallelisable, ~3 months)

  • Extract each product into svc-lending and svc-embedded- finance. Each one is straightforward once Payments + Ledger exist.

14.6 Phase 5 — Decommission monolith (2 weeks)

  • The only thing left in the monolith should be the business / employee CRUD. Either keep it as svc-business or fold its remaining responsibilities into svc-identity / svc-business.
  • Delete the monolith repo's old paths; tag a final image; archive.

14.7 New bank/wallet onboarding cadence

After Phase 3 is shipped, the goal is one new bank/wallet adapter every 2 weeks with the cookbook in §7.3.


15. Risk register

#RiskLikelihoodImpactMitigation
1Provider outage during peakHighMediumMulti-provider routing, fallback rail, in-product comms
2Webhook signature secret leakLowHighHSM-backed, rotated quarterly, alert on misuse
3Double-spend through retryMediumHighIdempotency keys; ledger reservation step
4KYC data breachLowCriticalTokenisation, per-tenant keys, no PII in logs, audit log
5Reconciliation breaks pile upMediumHighDaily diff alerts, block channel on > N unresolved
6DB migration brings down a serviceLowHighExpand-contract only, peer review, staging dress rehearsal
7Saga compensation incompleteMediumHighTemporal-based durable orchestration; chaos test
8Sanctions match missedLowCriticalDaily re-screen, fuzzy match, manual review queue
9NBE regulatory changeMediumVariableCompliance officer monitors directives; config-driven limits
10Insider misuseLowCriticalTwo-person rule, JIT access, audit, rotation, anomaly alerts
11DoS on webhook receiverMediumMediumWAF, rate-limit, queue-back-pressure
12Key data loss in DR scenarioLowCriticalMulti-region async + tested restore + immutable audit S3

16. Open decisions

Items that need a call from leadership / architecture before detailed design can start.

  • Cloud provider — AWS / Azure / local DC? Drives a lot of procurement and the data-residency story.
  • Service mesh — Istio vs Linkerd vs just mTLS sidecars? Trade operational cost for traffic policy power.
  • Event bus — Kafka (heavier, more featureful) vs NATS JetStream (lighter, simpler). Recommendation: NATS for now, migrate to Kafka if/when stream-processing demand grows.
  • Workflow engine — Temporal (recommended) vs custom NestJS sagas. Temporal saves months of engineering.
  • HSM / KMS — cloud KMS (AWS KMS / Azure Key Vault) vs local HSM. Cloud KMS is enough for v1; revisit for licensed signing requirements.
  • National ID integration — Fayda API availability and SLA? Backup ID document flow for users without Fayda yet.
  • Hosting — fully in-country (latency, data residency, harder ops) vs nearest cloud region (faster to ship, possible NBE pushback). Need legal opinion.
  • Brand — finalise positioning so SEO / public copy stays consistent. (See seo/multi-rail-positioning branch on the landing repo.)

Appendix A — Suggested repo layout (future state)

Demoz-Pay/
├── apps/
│ ├── gateway/ # API gateway + auth/JWT/RL
│ ├── svc-identity/ # NestJS service
│ ├── svc-business/
│ ├── svc-ledger/
│ ├── svc-payments/
│ ├── svc-bank-adapters/ # one app, many adapter modules
│ ├── svc-settlement/
│ ├── svc-payroll/
│ ├── svc-lending/
│ ├── svc-embedded-finance/
│ ├── svc-risk/
│ ├── svc-notify/
│ ├── svc-audit/
│ ├── svc-documents/
│ ├── svc-webhooks/
│ ├── svc-reports/
│ ├── svc-reference/
│ ├── admin/ # Next.js frontends — unchanged
│ ├── business/
│ ├── client/
│ ├── fi/
│ ├── bnpl-partner/
│ └── docs/
├── libs/
│ ├── contracts/ # gRPC .proto + generated types
│ ├── events/ # Domain event schemas (Avro/JSON)
│ ├── sdk/ # Internal SDK to call services
│ ├── ui/ # shared frontend components
│ ├── domain/ # shared domain types & validators
│ └── observability/ # logger, tracer, metrics helpers
├── infra/
│ ├── terraform/
│ ├── kubernetes/
│ └── runbooks/
└── prisma/ # per-service schemas live in svc-*

Appendix B — Adapter contract (gRPC)

syntax = "proto3";
package demoz.bankadapter.v1;

service BankAdapter {
rpc Send(SendRequest) returns (SendResponse);
rpc GetStatus(GetStatusRequest) returns (AttemptStatus);
rpc Refund(RefundRequest) returns (RefundResponse);
rpc PullSettlement(PullSettlementRequest) returns (SettlementBatch);
rpc Health(google.protobuf.Empty) returns (HealthResponse);
}

message SendRequest {
string idempotency_key = 1;
string amount_minor_units = 2; // e.g. "1500000" = ETB 15,000.00
string currency = 3; // "ETB"
Source source = 4;
Destination destination = 5;
string reference = 6;
map<string,string> metadata = 7;
}

message Destination {
enum Kind { BANK_ACCOUNT = 0; WALLET = 1; }
Kind kind = 1;
string identifier = 2; // acct # or phone (E.164)
string account_name = 3;
string bank_code = 4;
}
// (… rest elided for brevity, see libs/contracts when implemented …)

Appendix C — Initial telemetry catalogue (must-have on day one)

MetricTypeNotes
http_requests_total{service,route,status}counterper-endpoint
http_request_duration_seconds{service,route}histogramp50/p95/p99
db_query_duration_seconds{service,query}histogramwatch P95
bus_consumer_lag{consumer,topic}gauge<5 s healthy
payments_initiated_total{provider}counterper-rail
payments_succeeded_total{provider}counterper-rail
payments_failed_total{provider,reason}counterper-rail, per-reason
webhook_signature_invalid_total{provider}counteralert if >0
kyc_decision_total{tier,outcome}counterbusiness metric
settlement_diff_total{provider,bucket}countermatched / missing / extra
ledger_unbalanced_post_attempts_totalcountermust be 0

17. Polyglot tech stack — Node + Spring Boot + Go

Microservices win when you can pick the right language for the right problem. NestJS got us to MVP; for the production platform we'll deliberately mix three languages, each chosen for what it's genuinely good at.

17.1 Why polyglot

StrengthWeakness in our context
Node.js / NestJSWeb-facing services, BFF for our Next.js apps, JSON, fast iteration, shared types with frontendSingle-threaded; GC pauses under sustained heavy throughput; weaker for bank SOAP/ISO 8583
Spring Boot (Kotlin or Java 21)The Java ecosystem owns banking integration: CXF for SOAP, jPOS for ISO 8583, BouncyCastle for crypto, Drools for rules. Mature observability (Micrometer), threading, transactionsHeavier images, slower startup, more verbose code
GoLowest latency + highest throughput per CPU. Excellent gRPC support, goroutines for parallel fan-out, tiny static binaries, fast cold start (matters for K8s scale-out)Smaller library ecosystem than JVM; not ideal for complex domain rules
ServiceLanguageWhy
svc-identityNestJSAlready in our stack; mostly CRUD + JWT + integrations
svc-businessNestJSCRUD-heavy; shares types with frontends
svc-paymentsGoHighest QPS path; needs predictable latency and tiny memory footprint; the fan-out + idempotency engine maps cleanly to goroutines and channels
svc-ledgerGoHot write path; deterministic; the journal-posting service must be the fastest service in the platform
svc-bank-adapters (hub)Spring Boot (Kotlin)Banks speak SOAP, ISO 8583, fixed-width files. jPOS, Apache CXF, Spring Integration, Spring Batch make these trivial in JVM and painful elsewhere
svc-settlementGoBatch processing of large files; concurrent diff jobs
svc-payrollNestJSComplex business logic; benefits from shared TS types with the business portal
svc-lendingSpring BootUnderwriting rules engine (Drools), score cards, complex decision trees
svc-embedded-finance (EWA, BNPL, Equb)NestJSProduct features with rich UI parity; co-evolves with the client app
svc-riskSpring BootReal-time scoring with Drools; fits with svc-lending
svc-notifyNestJSI/O bound (SMS/email/push); rich template ecosystem
svc-auditGoHigh-write append-only service; minimal logic; high throughput
svc-documentsNestJSFile handling, integrations (ClamAV, S3)
svc-webhooksGoPublic-facing high-throughput endpoint; signature verification; needs to absorb traffic spikes
svc-reportsNestJSRead-side projections; shares types with frontends
svc-referenceGoRead-mostly with aggressive caching; tiny service

This is a recommendation, not a hard rule. A team should not adopt a language just because the table says so. When we extract a service, the team building it picks the language with the engineering lead.

17.3 How three languages stay one platform

Different language, same contract — what makes polyglot survive is uniform contracts and uniform operations.

Contracts as the source of truth

  • gRPC + Protocol Buffers for service-to-service. The .proto files live in libs/contracts/ and are codegen'd into:
    • TypeScript for Node services (ts-proto)
    • Kotlin/Java for Spring Boot (protoc-gen-grpc-java)
    • Go (protoc-gen-go-grpc)
  • JSON Schema + AsyncAPI for events on the bus. Codegen'd into each language's event types.
  • OpenAPI 3.1 for external (gateway-facing) APIs. Codegen'd into client SDKs.

The proto / schema files are reviewed like code — they're the real interface. Implementation can change freely; the contract cannot, except by versioning (v1, v2).

Operations as the great equaliser

Everything below is uniform across languages:

  • Containers: every service ships as a Docker image; same base hardening rules; same vulnerability scan.
  • Kubernetes: same Helm chart template; same liveness/readiness/ startup probes; same SIGTERM handling.
  • Observability: all three languages have first-class OpenTelemetry SDKs for traces + metrics + logs. Service mesh injects sidecars; no language-specific monitoring.
  • Secrets: Vault Agent sidecar mounts secrets to a tmpfs the app reads at startup. Same for all three languages.
  • CI/CD: per-language pipeline shape (build, test, scan, sign, promote) but the promotion gates are the same: SAST passes, vulns under threshold, container scan clean, image signed (cosign), staging soak time elapsed.
  • Auth: same OIDC for human users; same SPIFFE/SPIRE workload identity for service-to-service mTLS — issued by the same internal CA regardless of language.
  • Logging format: same JSON schema with mandatory fields (trace_id, span_id, service, env, level, msg, attrs). Each language uses its native logger configured to emit this shape.

Language-specific guardrails

Each language gets its own lint / format / test conventions, owned by a small enabling team or a designated maintainer:

LanguageBuildLintTestCoverage gate
TypeScriptpnpm workspaces, nxESLint + PrettierJest + Playwright80% lines on svc-*
KotlinGradle (Kotlin DSL)ktlint + detektJUnit5 + Testcontainers80% lines
GoGo modules per-servicegolangci-lint (strict)go test + Testcontainers-go80% lines

Local development

Devs run only the services they're working on, plus a docker- compose.local.yml that provides Postgres, Kafka/NATS, Redis, mock-providers. Other services run as stubbed remote via mock servers (Prism for OpenAPI, protomock for gRPC). Nobody runs the full platform locally.

17.4 When NOT to mix languages

Be deliberate:

  • Don't pick a language to learn it. Production fintech is not a playground. The team writing a service must already be senior in its language.
  • Don't fragment the team. If you have one engineer who can write Spring Boot today, that's a bus factor of one. Either hire/upskill before committing the service to JVM, or use a language the team already knows well.
  • Don't optimise prematurely. A NestJS service can do plenty of throughput. Only move svc-payments and svc-ledger to Go when you have actual profiling data showing Node is the bottleneck.
  • Don't pick exotic stacks (Rust, Elixir, Zig) — beautiful, but the talent market in Addis is thin and the integration libraries for banking aren't there yet. Java / Node / Go cover us cleanly.

17.5 Hiring & team shape

RoleApprox team size for Phase 1+2
Platform / infra engineer1 senior
Backend engineers (Node)2
Backend engineers (Go)2 senior — for svc-payments, svc-ledger, svc-webhooks, svc-audit
Backend engineer (JVM)1–2 senior — for svc-bank-adapters, svc-lending, svc-risk
Security engineer1 (can be fractional initially)
QA / SDET1
SRE / on-call1 (rotating with backend engineers initially)

Cross-training plan: every backend engineer should be able to read all three languages within 6 months. Writing them is optional.


18. Security parameters — what the bank questionnaire will ask

When a bank, MFI, or wallet operator gives us their security questionnaire (and they all do), this is the answer set. We maintain a separate, bank-handover-ready version of this in SECURITY_CONTROLS.md — the file is structured Q&A so it can be copy-pasted into vendor questionnaires with minimal editing.

The headline controls we'll be asked about, and our position:

DomainControlPosition
Encryption at restAES-256-GCM, KMS-managed keys, per-tenant DEKs, automatic rotationBuilt-in
Encryption in transitTLS 1.3 only, modern cipher suites only, HSTS + HPKP where applicableEnforced
Authentication (user)Phone + Argon2id + mandatory MFA (TOTP / WebAuthn)Implemented
Authentication (service-to-service)mTLS via SPIFFE/SPIRE, short-lived (24 h) certsEnforced
AuthorizationRBAC + OPA policy engine; least privilegeEnforced
Webhook signingHMAC-SHA256 or detached JWS; constant-time verify; nonce + timestampMandatory
IdempotencyUUIDv4 keys, 24 h TTL, response cachedBuilt-in
Audit logWORM (Postgres + S3 Object Lock), 7-year retention, Merkle-rooted daily, SIEM-streamedImplemented
SecretsHashiCorp Vault + dynamic per-service secrets, no env files in productionMandatory
Vulnerability mgmtSAST + SCA + container scan in CI; Critical 48 h, High 7 d SLAProcess
Penetration testingExternal independent test pre-launch and annually; segment-rescoped on major changeScheduled
Bug bounty / responsible disclosuresecurity.txt + private programme via HackerOne or local equivPlanned
NetworkWAF + DDoS at edge, private VPC, no service public-by-default, egress allow-listArchitecture
Data residencyCustomer + transaction data resident in Ethiopia (cloud region or in-country DC)Policy
Backup & recovery5-min Postgres PITR, 30-d hot, 7-y cold, monthly restore drillTested
BCP / DRMulti-AZ active-active, DR region async, quarterly failover drill, RPO 1 min / RTO 5 min on criticalTested
Incident responseDocumented playbook, 15-min ack SLA for SEV1, post-mortem within 5 business daysProcess
Change managementAll prod changes via PR + 2 reviewers + signed image promotion + canaryEnforced
Vendor / third-party riskVendor security review (signed DPA, SOC 2 if available), annual re-reviewProcess
Compliance roadmapISO 27001 within 18 months; SOC 2 Type II at scale; PCI DSS scope minimised through certified processorsRoadmap
Right to auditPer contract, with reasonable noticeAvailable

For the detailed control-by-control answers banks expect (the ones that fit in their spreadsheet), see SECURITY_CONTROLS.md.


Sign-off

This plan is a draft. Before we commit engineering time, the following sign-offs are needed:

  • Engineering lead — architecture, sequencing
  • Security lead — threat model, controls
  • Compliance / legal — NBE alignment, AML programme, data residency
  • Product — phase ordering vs commercial commitments
  • Finance — infra and tooling budget

Once signed off, this document becomes the canonical reference and every architectural deviation requires a one-page RFC referencing the section it changes.