Skip to main content

04 — How the backend works

This document explains the runtime architecture: what happens between the moment an HTTP request lands and the moment a money operation commits.

For the full operator-level picture — every container, every port, every API surface, every database, plus three end-to-end traces (EWA disburse, daily reconciliation, what happens at pnpm dev:api boot) — see docs/architecture/SYSTEM_TOPOLOGY.md. That doc is the single most concrete reference; this one is the conceptual primer.

The architecture in one picture

┌────────────────────────────────────────────────────────────────────┐
│ Web apps + curl / partner systems │
└──────────────────────────────┬─────────────────────────────────────┘
│ HTTPS

┌──────────────────────────────────┐
│ NestJS modular monolith │
│ apps/api (TypeScript) │
│ │
│ ┌─────────────────────────┐ │
│ │ /api/auth/* │ │
│ │ better-auth handler │ │
│ │ (Express-mounted │ │
│ │ BEFORE Nest) │ │
│ └─────────────────────────┘ │
│ │
│ ┌─────────────────────────┐ │
│ │ /api/* (Nest routes) │ │
│ │ • SessionMiddleware │ │
│ │ • TenantContextMW │ │
│ │ • AuthGuard (default- │ │
│ │ deny; @Public() │ │
│ │ opt-out) │ │
│ │ • Controller │ │
│ │ • UseCase │ │
│ │ • PrismaTxnRunner │ │
│ │ ↓ writes to: │ │
│ │ - domain table │ │
│ │ - audit_entry │ │
│ │ - outbox_event │ │
│ └─────────────────────────┘ │
└──────┬────────────────────┬──────┘
│ │
│ gRPC │ Postgres
│ (PostTransaction, │ (Prisma)
│ GetBalance, …) │
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ services/ledger │ │ Postgres (single DB │
│ Go gRPC │ │ for the monolith) │
│ Money source of truth │ │ RLS forced on the 5 │
│ Own Postgres DB │ │ financial tables │
│ Append-only journal │ │ │
└────────────────────────┘ └────────────────────────┘

│ Background drain

┌────────────────────────┐
│ OutboxPublisherService│
│ (worker in apps/api) │
│ Uses BYPASSRLS role │
│ via OUTBOX_DATABASE_URL│
└────────────────────────┘

│ Kafka producer

┌────────────────────────┐
│ Redpanda / Kafka │
│ Topic per bounded ctx │
└────────────────────────┘

│ Consumers

┌─────────────────────────────────────────────────┐
│ services/notifications [Stub] │
│ services/integration-gateway [Stub] │
│ Future: reconciliation engine, BI sink │
└─────────────────────────────────────────────────┘

Status of each box, brutally honest:

BoxStatus
NestJS monolith (apps/api)Live — boots cleanly, all routes registered, health/metrics work, default-deny auth works
/api/auth/* better-auth handlerPartial — email/password + organization plugin Live (sign-up E2E proven); phone OTP Partial (LoggingSmsSender is dev-only); 2FA Planned
Postgres (monolith DB)Live — 9 migrations apply clean; RLS forced + verify-guarded on 5 tables
services/ledgerPartial — schema + DB invariants Live; Go server Compile-only on most hosts
services/integration-gatewayStub
services/notificationsStub
OutboxPublisherServiceLive (gated by env; disabled by default)
Redpanda / KafkaLive for local dev (docker-compose has it); not exercised in any consumer yet

The two-language ceiling (ADR-010)

DemozPay uses only TypeScript and Go. Each has its strengths and the choice for each piece is deliberate:

TypeScript / NestJS ownsGo owns
HTTP API surfaceMoney posting (the ledger)
Auth (better-auth)Reconciliation engine (planned)
Domain orchestrationSettlement engine (planned)
Business workflowsHigh-throughput financial operations
Tenant context plumbingAnything where GC pauses or runtime surprises are unacceptable
Prisma + frontend BFFLong-lived background workers

The rule isn't "Go for fast, TS for slow." It's:

  • Money correctness paths → Go. Predictable runtime, no GC surprises, easy to audit.
  • Domain orchestration → TS. Fast iteration, type-safe domain modeling, large ecosystem.

When you find yourself wanting to introduce a third language (Java, Python, Rust, Kotlin), stop and write an ADR. ADR-010 explicitly rejects this.

The request flow for a financial route

This is what happens when a request lands on POST /api/ewa/requests:

1. HTTPS request arrives. Express receives it.
2. Express's path matcher: does it match /api/auth/* ?
YES → forward to better-auth's toNodeHandler. RETURN.
NO → pass to NestJS router.
3. NestJS router matches /api/ewa/requests to EwaController.create.
4. NestJS middleware chain (in declared order):
a. SessionMiddleware (apps/api/src/identity/auth/session.middleware.ts)
- Reads the better-auth.session_token cookie
- Calls auth.api.getSession({ headers })
- Sets req.user = {
id, email, emailVerified,
businessId = session.session.activeOrganizationId,
role
}
- If no session: leaves req.user undefined; passes through.
b. TenantContextMiddleware (apps/api/src/identity/tenant/...)
- Resolves tenantId from, in order:
req.user.businessId
req.user.tenantId
req.headers['x-tenant-id']
- If found: runs the rest of the request inside
runWithTenant({tenantId}, next)
which establishes an AsyncLocalStorage frame.
- If not found: passes through; downstream getTenantId()
will return undefined.
5. NestJS guard chain:
a. AuthGuard (registered as APP_GUARD)
- If @Public() decorator on the controller/handler: pass.
- Otherwise: req.user.id must be set or throw 401.
6. NestJS controller invocation:
EwaController.create({ body, headers })
body validated via Zod / class-validator (DTOs)
idempotencyKey extracted from header
7. UseCase call:
RequestEwaUseCase.execute({
tenantId: getTenantId(),
employeeId, payPeriodId, amount, idempotencyKey
})
8. UseCase orchestration (all inside one PrismaTransactionRunner):
- SELECT set_config('app.tenant_id', $tenantId, true) [RLS]
- Read accrued earnings via AccruedEarningsPort
- Compute fee via EligibilityPolicy
- Check IdempotencyStore: have we seen this key before?
YES → return cached result; commit no-op
NO → continue
- Insert ewa_request row
- Call LedgerGrpcClient.postTransaction(...) over gRPC
(this hits services/ledger; if it isn't running, 500)
- Append audit_entry row
- Append outbox_event row
- Insert idempotency_record row with the result
- Commit. The deferred trigger ledger_assert_balanced
runs at COMMIT inside the ledger's own txn.
9. NestJS response handling:
- HttpMetricsInterceptor records latency + count
- Pino logs with trace_id/span_id from OpenTelemetry
- Response sent

Every step is testable. Steps 1–6 are exercised by the live sign-up demo (you can curl your way through them yourself per 02-running-locally.md). Steps 7–9 require the Go ledger to be running.

The transactional spine

The single most important piece of code in the platform:

apps/api/src/_infra/shared-infra/prisma-transaction-runner.ts
async runInTransaction<T>(
work: (tx: Prisma.TransactionClient) => Promise<T>
): Promise<T> {
const tenantId = getTenantId(); // from AsyncLocalStorage
return this.prisma.$transaction(async (tx) => {
// ① Set app.tenant_id for RLS (LOCAL — resets at COMMIT)
await tx.$executeRaw`SELECT set_config('app.tenant_id', ${tenantId}, true)`;

// ② Hand the typed tx to the use case
return work(tx);
});
}

What this gives you for free, inside work(tx):

  1. Tenant isolation — RLS sees the right tenant on every query.
  2. Atomicity — domain write + audit + outbox all commit or all roll back together (ADR-008).
  3. Idempotency — the idempotency store insert and the domain write are in the same tx; a retry cannot half-commit.

Rule: every money-touching write goes through the runner. If you find yourself calling prisma.something.create() directly in a domain use case, you're bypassing the runner — and you've also bypassed RLS, audit, and the outbox. Don't.

The ledger (services/ledger)

The ledger is a separate Go service with its own Postgres database. Two reasons (ADR-006):

  1. Blast radius isolation. A bug in the EWA domain can't corrupt the ledger. A bug in the ledger code can't corrupt EWA data.
  2. Different durability story. The ledger is append-only at the database level (triggers reject UPDATE/DELETE). It can use a different backup strategy, different replicas, different access patterns.

The ledger schema (services/ledger/migrations/0001_init.up.sql)

ledger_account Chart of accounts, per tenant
(id, tenant_id, code, name, type, currency)
type ∈ {ASSET, LIABILITY, EQUITY, REVENUE, EXPENSE}

ledger_transaction Journal headers
(id, tenant_id, idempotency_key, request_fingerprint,
description, value_date, posted_at, reverses_transaction_id,
metadata)
UNIQUE (tenant_id, idempotency_key)

ledger_entry Journal lines (debits and credits)
(id BIGINT, transaction_id, tenant_id, account_id,
direction (DEBIT|CREDIT), amount_santim NUMERIC(20,0),
currency, created_at)

ledger_account_balance Derived view, never a column
(computes signed balance from entries using account-type sign rules)

DB-level invariants — enforced regardless of service code:

  1. Balanced: ledger_assert_balanced deferred constraint trigger raises at COMMIT if debits ≠ credits per currency.
  2. Append-only: ledger_block_mutation trigger raises on any UPDATE/DELETE.
  3. Idempotent: UNIQUE(tenant_id, idempotency_key) rejects duplicate posts.
  4. Single reversal: partial UNIQUE index on (tenant_id, reverses_transaction_id) rejects double-reversal.
  5. Tenant isolated: RLS forced on all 3 ledger tables.

These are all runtime-proven via psql probes on this host. You can re-run them with the verification harness at services/ledger/test/verify.sh.

The ledger's RPCs (packages/contracts/grpc/ledger.proto)

RPCWhat it doesStatus
PostTransactionAtomic multi-leg insert; idempotent via key+fingerprintcode written
GetBalanceRead derived balance (current or as-of)code written
ReverseCompensating-entry txn with double-reversal lockoutcode written
GetEntriesPaginated journal scan with opaque cursorcode written
ReconcileAccountIndependent Go-side sum vs view; returns driftcode written

All five are implemented in services/ledger/internal/server/. None have been run against a live Go server on this host (no Go toolchain). Schema-level invariants ARE proven live.

The outbox pattern (ADR-008)

The transactional outbox is how DemozPay does cross-service messaging without losing events. The pattern:

1. Inside the runner's transaction, AS PART of the same commit:
- Insert the domain row (e.g. ewa_request)
- Insert an outbox_event row with the event payload
- Insert an audit_entry row

If the txn rolls back, ALL three roll back. If it commits, ALL
three commit. No half-states.

2. A SEPARATE process (OutboxPublisherService in apps/api):
- Polls outbox_event WHERE publishedAt IS NULL
- Claims a batch with FOR UPDATE SKIP LOCKED (multi-instance safe)
- Publishes each row to Kafka
- Marks publishedAt = now()
- Commits the publish + the mark

If the publisher crashes between publish and mark, the row stays
unpublished and gets re-tried on the next tick. Consumers must
be idempotent (the event id is stable).

3. Consumers:
- services/notifications: send SMS / email / push
- services/integration-gateway: trigger bank/wallet calls
- Future: reconciliation, BI

Critical: the publisher must use a separate DB role with BYPASSRLS. With tenant RLS active, the API role cannot see other tenants' outbox rows. The role provisioning is at infra/sql/00_create_outbox_publisher_role.sql. See ADR-013.

If OUTBOX_DATABASE_URL is unset, the publisher falls back to the API role and emits a loud WARN at boot. Read those warnings.

Auth (better-auth)

DemozPay uses better-auth for authentication. Why:

  • Self-hosted (no SaaS lock-in, important for Ethiopia compliance)
  • Phone-OTP plugin (primary auth path for our market)
  • Organization plugin (maps cleanly to tenantId == businessId)
  • Prisma adapter
  • 2FA plugin (TwoFactor table exists; plugin not yet wired — Planned)

The wiring:

apps/api/src/identity/auth/
├── better-auth.factory.ts Constructs the better-auth instance
├── auth.module.ts NestJS DI wiring
├── auth.tokens.ts DI tokens
├── session.middleware.ts Populates req.user from the session cookie
├── auth.guard.ts Default-deny APP_GUARD; respects @Public()
├── public.decorator.ts The opt-out for health/metrics/root
└── sms-sender.ts SmsSender interface + LoggingSmsSender

The handler is mounted at /api/auth/* on the underlying Express app, BEFORE NestJS's router takes effect:

// apps/api/src/main.ts
const expressApp = app.getHttpAdapter().getInstance();
expressApp.all('/api/auth/*splat', toNodeHandler(auth));

This is why you won't find an "auth controller" in NestJS terms. better-auth owns the entire /api/auth/* subtree.

The Organization.id == Business.id invariant

Session.activeOrganizationId
== Organization.id
== Business.id
== app.tenant_id (the GUC value RLS uses)

This identity-equation runs all the way down. Set up in commit #3 of the better-auth integration:

  • BusinessService.create() runs Business + Organization creation in a single $transaction with the same id.
  • A backfill migration (20260526050000_bootstrap_organizations) filled in matching Organizations for existing Businesses, with a verify guard.
  • Organization.id has a FK to Business.id to enforce at DB level.

The benefit: zero transformation layer. The tenant is whatever the user's active org is, and the active org IS the Business.

Tenant isolation (ADR-013)

Read ADR-013 for the full canonical doc. Quick version:

  • Every financial-tier table has a tenantId column and an RLS policy: "tenantId" = current_setting('app.tenant_id', true).
  • FORCE ROW LEVEL SECURITY — even the table owner is bound.
  • current_setting('app.tenant_id', true) returns NULL when unset; any row's tenantId compared to NULL returns NULL → no rows match → fail-closed.
  • The runner sets the GUC inside every txn via parameterized SQL.
  • The 5 financial tables (ewa_request, loan, outbox_event, idempotency_record, audit_entry) are under RLS.
  • The 8 identity tables (User, Session, Account, Verification, TwoFactor, Organization, Member, Invitation) are INTENTIONALLY excluded. They need cross-tenant reads by design.
  • Legacy financial tables (Business, Employee, Payroll, Wallet, etc.) are NOT yet under RLS. They're scoped only by application-level WHERE. This is the largest standing isolation risk — flagged in ADR-013.

Idempotency (ADR-007)

Two layers of idempotency, both Live at the ledger:

  1. API gateway layerIdempotency-Key header on money-moving POSTs. Stored in idempotency_record table with (tenantId, scope, key) composite PK. A duplicate request hits INSERT ... ON CONFLICT DO NOTHING → returns the cached result from the original. Concurrent duplicates fail loudly.
  2. Ledger layer(tenant_id, idempotency_key) UNIQUE on ledger_transaction. Plus a request_fingerprint column for the second arm of the contract:
    • Same key + same fingerprint → return cached transaction
    • Same key + DIFFERENT fingerprint → FailedPrecondition

The two layers are independent so a retry between API and ledger is also safe.

Observability

ConcernStatusWhere
Structured loggingLiveapps/api/src/_infra/observability/pino-logger.ts (pino + trace correlation)
Distributed tracingLive (gated)apps/api/src/_infra/observability/tracing.ts — OTel SDK, exports if OTEL_EXPORTER_OTLP_ENDPOINT set. No OTLP backend wired in dev.
MetricsLiveapps/api/src/_infra/observability/metrics/ — prom-client, custom + Node defaults, /api/metrics endpoint
Health probesLiveapps/api/src/_infra/health//healthz (liveness), /readyz (deps), startup checks at boot

Custom metrics worth knowing:

  • demozpay_http_requests_total{method,route,status_code}
  • demozpay_http_request_duration_seconds{...} (histogram with fintech-tuned buckets)
  • demozpay_ledger_grpc_duration_seconds{rpc,status}
  • demozpay_outbox_unpublished_total
  • demozpay_outbox_oldest_unpublished_age_seconds ← the SLO signal
  • demozpay_dependency_up{dependency} (1=up, 0=down, -1=skipped)

Cardinality discipline (enforced by code review): NO userId, tenantId, idempotency_key, raw URL labels. Route TEMPLATES only, exact status codes only.

Continue reading

Next: 05-the-frontend-apps.md for a short tour of the web apps, or skip to 06-status-matrix.md for the honest matrix.