ADR-013: Tenant isolation via Postgres RLS, fail-closed by default
- Status: Accepted
- Date: 2026-05-28
- Deciders: Engineering lead, Security lead
- Supersedes: Tenant-isolation guidance previously distributed across
apps/api/prisma/RLS.md(now a short reference doc that points here),CLAUDE.md,PROJECT_STRUCTURE.md, andSECURITY_CONTROLS.md.
Context
DemozPay is multi-tenant where the tenant is the employer business. Every employer's data — payroll, EWA advances, loans, audit trail, outbox events, idempotency records — must be invisible to every other employer, and to any unauthenticated caller. The blast radius of forgetting to scope a single query is a regulatory incident: one employer's salary advances visible to another, or worse, an unauthenticated query returning the entire ledger.
Application-layer scoping ("every query has a WHERE businessId = ?")
is fragile. It depends on every developer remembering, every time, on
every route, for every domain. A single missed clause in a future
domain, or in a future tool that bypasses the API, exposes everyone.
This is how most multi-tenant fintechs eventually leak.
We need an isolation model that:
- Is fail-closed: missing tenant context returns no rows, never "all rows".
- Is enforced at the database, not in application code, so that any future caller — ORM, raw SQL tool, reconciliation job, support query — inherits the guarantee.
- Is impossible to deploy partially: the moment a tenant table exists in production, its tenant isolation must already exist.
- Has a single, documented exception path for legitimate cross-tenant workloads (outbox publisher, future reconciliation, future support tooling).
- Maps cleanly to our session model so application code never has to
pass
tenantIdexplicitly.
Decision
Tenant isolation is enforced by Postgres Row-Level Security (RLS)
with FORCE ROW LEVEL SECURITY, keyed on a per-transaction GUC
(app.tenant_id), set by PrismaTransactionRunner before any query
runs. The GUC is sourced from req.user.businessId, which is set by
SessionMiddleware from the active organization on the better-auth
session. With no tenant set, the policy predicate evaluates to NULL
and no rows match — fail-closed.
The exception is a tightly-scoped BYPASSRLS role used only by the
outbox publisher. It connects through a separate OUTBOX_DATABASE_URL
and is granted only SELECT, UPDATE on outbox_event — never DDL,
never any other table.
The RLS policy is applied as a numbered Prisma migration
(20260526030000_apply_tenant_rls) that ends with a verify guard
DO block which raises if any expected table is missing RLS, FORCE,
or the policy. A deploy that produces tables without isolation will
fail loudly at migration time.
The current set of tenant tables (financial-data tier) is:
ewa_requestloanoutbox_eventidempotency_recordaudit_entry
Identity-tier tables are INTENTIONALLY excluded from RLS. See "Tables intentionally bypassing RLS" below.
Alternatives considered
- Application-layer scoping only (every query carries
WHERE businessId = ?). Rejected. One missed clause exposes everyone; the cost of being wrong is regulatory, not technical. Code review and lint rules reduce risk but cannot eliminate it because raw SQL and future tooling can always sidestep. - Schema-per-tenant (one Postgres schema per business). Rejected. Doesn't scale beyond a few hundred tenants; cross-tenant reconciliation queries become operationally painful; ALTER on hundreds of schemas is slow.
- Database-per-tenant. Rejected for the same reasons plus connection-pool explosion and migration coordination cost. Reserve this option for premium / enterprise tenants if it ever becomes a product requirement.
- Tenant ID embedded in JWT, enforced by NestJS guard. Rejected. Application-layer enforcement only — same fragility as the first alternative. Useful AS PART of the design (we do this for routing) but not sufficient on its own.
The fail-closed contract
The policy on every tenant table is:
CREATE POLICY tenant_isolation ON <table>
USING ("tenantId" = current_setting('app.tenant_id', true))
WITH CHECK ("tenantId" = current_setting('app.tenant_id', true));
USINGgates reads.WITH CHECKgates writes (INSERT / UPDATE).current_setting('app.tenant_id', true)— thetruesecond arg makes a missing GUC return NULL instead of erroring. NULL is the point: any row'stenantIdcompared to NULL returns NULL → no row matches. No tenant context = no rows visible.FORCE ROW LEVEL SECURITYis set so the policy applies even to the table owner. Without FORCE, the role that Prisma uses (which IS the table owner during dev) would silently bypass.
This is enforced as DB-level invariant, not as application discipline.
How tenant context propagates
┌─────────────────────────────────────────────────────────────────┐
│ HTTP request arrives with better-auth session cookie │
└──────────────────────────┬──────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ SessionMiddleware │
│ apps/api/src/identity/auth/session.middleware.ts │
│ │
│ • auth.api.getSession({ headers }) │
│ • If a session exists: │
│ req.user = { │
│ id: session.user.id, │
│ email: session.user.email, │
│ businessId: session.session.activeOrganizationId, │
│ role: session.user.role, │
│ } │
│ • Non-enforcing — absence of a session leaves req.user empty. │
└──────────────────────────┬──────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ TenantContextMiddleware │
│ apps/api/src/identity/tenant/tenant-context.middleware.ts │
│ │
│ Reads tenantId from, in order: │
│ req.user.businessId (normal flow) │
│ req.user.tenantId (forward-compat) │
│ req.headers['x-tenant-id'] (service-to-service) │
│ Then runs the request inside runWithTenant(tenantId, next) │
│ which establishes an AsyncLocalStorage frame. │
│ │
│ No tenant resolved → request passes through with no context; │
│ any handler that then calls getTenantId() fails closed. │
└──────────────────────────┬──────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ AuthGuard (default-deny APP_GUARD) │
│ apps/api/src/identity/auth/auth.guard.ts │
│ │
│ Unless the route is @Public(), returns 401 if req.user.id │
│ is missing. Health / metrics / root opt out. │
└──────────────────────────┬──────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Domain controller → use case → PrismaTransactionRunner │
│ apps/api/src/_infra/shared-infra/prisma-transaction-runner.ts │
│ │
│ At the start of every transaction, before any query runs: │
│ SELECT set_config('app.tenant_id', ${tenantId}, true) │
│ │
│ • `${tenantId}` comes from getTenantId() — the GUC value is │
│ pinned to the same value AsyncLocalStorage carries. │
│ • `true` (3rd arg) = LOCAL — auto-resets at COMMIT/ROLLBACK. │
│ • Parameterised — no SQL injection. │
│ • The same `tx` handle is threaded to the domain repo, the │
│ outbox writer, and the audit emitter so a state change + its │
│ event + its audit row commit atomically (ADR-008). │
└─────────────────────────────────────────────────────────────────┘
Service-to-service / Go ledger
The Go ledger service runs its own WithTenantTx (in
services/ledger/internal/pg/pool.go) which does the same
SELECT set_config('app.tenant_id', $1, true) before any query inside
a transaction. tenantId arrives as a proto field on the RPC; there
is no GUC inheritance between the API monolith and the ledger
service. Each side enforces its own tenant scope.
activeOrganizationId — the canonical mapping
The tenant of a request is the active organization of the better-auth session. We deliberately keep:
Organization.id == Business.id (FK enforced at the DB)
so that:
Session.activeOrganizationId == req.user.businessId == app.tenant_idat every layer — no transformation, no lookup.- The Business bootstrap (commit #3 of better-auth integration) creates
the matching Organization row in the same
$transaction. - A user switching organizations via better-auth's
/api/auth/organization/set-activeinstantly changes the tenant scope of every subsequent request.
Tables intentionally bypassing RLS
Not every table should be tenant-scoped. The auth/identity tier (better-auth tables) is INTENTIONALLY excluded:
| Table | Why excluded from RLS |
|---|---|
User | Identity is cross-tenant by design. A user can belong to multiple Organizations (the org plugin's Member table records the relationships). Putting User under RLS would break login itself — at login time there is no session yet, so no tenant. |
Session | Looked up by token at the start of every authenticated request, before any tenant context exists. The session is what establishes the tenant; it cannot itself be tenant-scoped. |
Account | One row per credential. Joined from User. Same reasoning. |
Verification | Short-lived OTP / email-verify / pwd-reset codes. The identifier (email or phone) is addressable before a User even exists; cannot be tenant-scoped. |
TwoFactor | One per User. Identity-tier. |
Organization | Has the same id as Business but lives in the identity tier so the org plugin can list a user's orgs at session time (before any tenant is active). |
Member | The (User, Organization, role) mapping. Reading this is how we resolve "which orgs is this user allowed into?" — by definition the read must span the user's tenants. |
Invitation | Addressed by email before the invitee exists as a User. |
Business and the legacy Employee/Payroll/etc. tables | Not yet under RLS. These predate the financial-data RLS rollout and are scoped today by application-level WHERE businessId = ?. They will be brought under RLS as part of their domain extraction (deferred per restructure-2026-05.md Phases 5–7). Until then, treat them as application-scoped and review queries carefully. |
The financial-data tier (ewa_request, loan, outbox_event,
idempotency_record, audit_entry) IS under RLS today.
Adding a new tenant table? Add it to both tenant_tables arrays
in 20260526030000_apply_tenant_rls/migration.sql — the policy
install AND the verify block. The verify will raise loudly if the
policy is listed but not in place.
BYPASSRLS — the one explicit exception
The default Prisma role used by the API must NOT have BYPASSRLS.
With FORCE ROW LEVEL SECURITY set, this is the only thing keeping
the API honest.
Some workloads legitimately need cross-tenant access:
- The outbox publisher drains
outbox_eventfor all tenants — per definition not scoped to one. - Future reconciliation jobs read across tenants for finance reports.
- Future support tooling may need read-only cross-tenant access.
These workloads must each use a separate role with BYPASSRLS
and narrowly-scoped grants:
demozpay_outbox_publisher—LOGIN BYPASSRLS, granted onlySELECT, UPDATEonoutbox_event. Provisioned byinfra/sql/00_create_outbox_publisher_role.sql(idempotent; verifies BYPASSRLS actually got set; operators run once per database). Used viaOUTBOX_DATABASE_URL.
The outbox publisher service prefers OUTBOX_DATABASE_URL when set.
With tenant RLS active and the publisher running under
DATABASE_URL's regular role, the publisher would silently drain
zero rows — RLS hides every row. The publisher logs a loud WARN at
boot in that case so the misconfiguration is visible (apps/api/src/ outbox/outbox-publisher.service.ts does this explicitly).
Rules for new BYPASSRLS roles:
- ONE role per workload. Never reuse the publisher's role for reconciliation or support — each gets its own.
- Minimum grants. No DDL. Only the specific tables and operations the workload needs.
- Separate secret from
DATABASE_URL. Treat the BYPASSRLS credential as more privileged than the regular DB credential. - No HTTP request handlers may ever use it. It's reserved for background jobs and operator tooling.
- Document in this ADR when a new BYPASSRLS role is added.
Operational risks and how we mitigate them
Risk: a developer forgets to thread the transaction runner
Symptom: a domain use case opens its own prisma.$transaction directly
without going through PrismaTransactionRunner. The GUC never gets
set; the query sees no rows; the developer thinks "nothing matches"
and adds a workaround.
Mitigation: domain code may only see Store ports
(shared/database's TransactionRunner<object>); the concrete
PrismaTransactionRunner is injected at the adapter boundary. Lint
rule blocks @prisma/client import in domain/application layers
(ADR-003).
Risk: a deploy applies the schema migration but not the policy
Cannot happen in this codebase: the policy migration
20260526030000_apply_tenant_rls is part of the same Prisma migration
sequence and its verify DO block raises if any expected table is
missing RLS / FORCE / policy. The deploy fails before serving any
traffic.
Risk: migration ordering — a new tenant table created but not added to RLS
Symptom: the table is created by an earlier migration. The RLS migration ran before, so its verify guard passes against the old list and the new table goes live with NO isolation. High risk — this is the classic incident.
Mitigation: every PR that creates a new financial-tier table MUST
include a follow-on migration that adds the table to both
tenant_tables arrays in 20260526030000_apply_tenant_rls (or
re-runs an equivalent guard with the expanded list). Code review
gate: any new tenantId column is a flag to require it. ADR-013
specifically calls this out as a process risk; lint coverage for
this is a Planned improvement.
Risk: local dev sets BYPASSRLS to make development "easier"
Symptom: a developer runs ALTER ROLE ... BYPASSRLS locally to
unblock a query. Code works in dev, fails silently in prod (where
the role doesn't have BYPASSRLS).
Mitigation: do not document this workaround anywhere. The right fix
is always to set the tenant via the runner. If you genuinely need
cross-tenant local exploration, use the same OUTBOX_DATABASE_URL
pattern with a temporary BYPASSRLS role you create and drop.
Risk: Prisma transaction caveats
Prisma's $transaction opens a single Postgres transaction. Each
LOCAL set_config(..., true) is scoped to that transaction; it
resets on COMMIT or ROLLBACK. However:
- Inside Prisma's
$transaction([batch])array form (parallel statements), the GUC IS available — they all run on the same underlying transaction. - Outside any transaction (raw
prisma.querynot under a runner), the GUC is NOT set. Such queries will return zero rows from RLS tables. This is intentional — it's the fail-closed property.
Risk: SQL injection into the GUC value
Cannot happen: set_config('app.tenant_id', ${tenantId}, true) is
called via Prisma's tagged-template tx.$executeRaw\...` which parameterises the value. The same applies on the Go side (tx.Exec(ctx, "SELECT set_config('app.tenant_id', $1, true)",
tenantID)`).
Risk: a background job runs without tenant context
Symptom: a cron-style job calls into a domain service. No request, no SessionMiddleware, no TenantContextMiddleware → no tenant. The job's queries return zero rows.
Mitigation: background jobs must explicitly call
runWithTenant({ tenantId }, () => ...) for the tenant they operate
on. Multi-tenant jobs (e.g. nightly reconciliation across all
employers) must loop and re-enter the tenant context for each one.
The publisher is the explicit exception via BYPASSRLS.
Consequences
Positive
- Tenant isolation is a database invariant, not an application discipline. Any future query path inherits it.
- Deploy-time safety: an ALTER that drops RLS or removes a policy can't ship — the next migration's verify guard raises.
- Single mental model:
app.tenant_id=req.user.businessId=Session.activeOrganizationId=Organization.id=Business.id. No transformation layer. - Audit-friendly: a regulator asking "show me how you guarantee Employer A cannot see Employer B's data" gets a single answer with testable evidence (migration + policy text + verify block + the runtime check we ran during the May 2026 verification).
Negative
- Local dev complexity: forgetting to set the GUC returns zero rows, which is initially confusing for new engineers. The runner hides this for normal code paths but raw queries can be surprising.
- Cross-tenant tooling cost: every reconciliation or admin tool needs its own BYPASSRLS role. Worth the cost; reduces accidental grant leakage.
- Legacy tables are inconsistent:
Business,Employee,Payroll,Walletetc. are NOT under RLS today. Application-side scoping is the only protection until their domain extractions ship.
Follow-ups
- Bring legacy financial tables under RLS as their domain
extractions land (
Wallet,Payroll,BNPLPurchase, etc.). - Add a CI check that any new column named
tenantIdcauses the PR to be flagged for "did you add it to the RLS verify list?". - Add
docs/runbooks/verify-rls.md— one-page operator runbook for "is tenant isolation working right now?" with the exact psql queries that prove it on a live database. (Planned.) - Consider extending RLS to the better-auth
Organization/Membertables once we have multi-org orchestration tooling that can run outside RLS via a dedicated role.
References
- Implementation:
apps/api/prisma/migrations/20260526030000_apply_tenant_rls/migration.sql - API-side:
apps/api/src/identity/auth/session.middleware.ts,apps/api/src/identity/tenant/tenant-context.middleware.ts,apps/api/src/_infra/shared-infra/prisma-transaction-runner.ts - Ledger-side:
services/ledger/internal/pg/pool.go(WithTenantTx) - BYPASSRLS role provisioning:
infra/sql/00_create_outbox_publisher_role.sql - Outbox publisher:
apps/api/src/_infra/outbox/outbox-publisher.service.ts - Implementation detail reference (slimmed):
apps/api/prisma/RLS.md— kept as a short pointer back to this ADR. - Related: ADR-001 (modular monolith), ADR-008 (audit + outbox in same txn), ADR-009 (no DELETE), ADR-011 (cross-domain events only), ADR-012 (ledger accounting model).