ADR-013: Tenant isolation via Postgres RLS, fail-closed by default

Status: Accepted
Date: 2026-05-28
Deciders: Engineering lead, Security lead
Supersedes: Tenant-isolation guidance previously distributed across apps/api/prisma/RLS.md (now a short reference doc that points here), CLAUDE.md, PROJECT_STRUCTURE.md, and SECURITY_CONTROLS.md.

Context

DemozPay is multi-tenant where the tenant is the employer business. Every employer's data — payroll, EWA advances, loans, audit trail, outbox events, idempotency records — must be invisible to every other employer, and to any unauthenticated caller. The blast radius of forgetting to scope a single query is a regulatory incident: one employer's salary advances visible to another, or worse, an unauthenticated query returning the entire ledger.

Application-layer scoping ("every query has a WHERE businessId = ?") is fragile. It depends on every developer remembering, every time, on every route, for every domain. A single missed clause in a future domain, or in a future tool that bypasses the API, exposes everyone. This is how most multi-tenant fintechs eventually leak.

We need an isolation model that:

Is fail-closed: missing tenant context returns no rows, never "all rows".
Is enforced at the database, not in application code, so that any future caller — ORM, raw SQL tool, reconciliation job, support query — inherits the guarantee.
Is impossible to deploy partially: the moment a tenant table exists in production, its tenant isolation must already exist.
Has a single, documented exception path for legitimate cross-tenant workloads (outbox publisher, future reconciliation, future support tooling).
Maps cleanly to our session model so application code never has to pass tenantId explicitly.

Decision

Tenant isolation is enforced by Postgres Row-Level Security (RLS) with FORCE ROW LEVEL SECURITY, keyed on a per-transaction GUC (app.tenant_id), set by PrismaTransactionRunner before any query runs. The GUC is sourced from req.user.businessId, which is set by SessionMiddleware from the active organization on the better-auth session. With no tenant set, the policy predicate evaluates to NULL and no rows match — fail-closed.

The exception is a tightly-scoped BYPASSRLS role used only by the outbox publisher. It connects through a separate OUTBOX_DATABASE_URL and is granted only SELECT, UPDATE on outbox_event — never DDL, never any other table.

The RLS policy is applied as a numbered Prisma migration (20260526030000_apply_tenant_rls) that ends with a verify guard DO block which raises if any expected table is missing RLS, FORCE, or the policy. A deploy that produces tables without isolation will fail loudly at migration time.

The current set of tenant tables (financial-data tier) is:

ewa_request
loan
outbox_event
idempotency_record
audit_entry

Identity-tier tables are INTENTIONALLY excluded from RLS. See "Tables intentionally bypassing RLS" below.

Alternatives considered

Application-layer scoping only (every query carries WHERE businessId = ?). Rejected. One missed clause exposes everyone; the cost of being wrong is regulatory, not technical. Code review and lint rules reduce risk but cannot eliminate it because raw SQL and future tooling can always sidestep.
Schema-per-tenant (one Postgres schema per business). Rejected. Doesn't scale beyond a few hundred tenants; cross-tenant reconciliation queries become operationally painful; ALTER on hundreds of schemas is slow.
Database-per-tenant. Rejected for the same reasons plus connection-pool explosion and migration coordination cost. Reserve this option for premium / enterprise tenants if it ever becomes a product requirement.
Tenant ID embedded in JWT, enforced by NestJS guard. Rejected. Application-layer enforcement only — same fragility as the first alternative. Useful AS PART of the design (we do this for routing) but not sufficient on its own.

The fail-closed contract

The policy on every tenant table is:

CREATE POLICY tenant_isolation ON <table>
  USING ("tenantId" = current_setting('app.tenant_id', true))
  WITH CHECK ("tenantId" = current_setting('app.tenant_id', true));

USING gates reads. WITH CHECK gates writes (INSERT / UPDATE).
current_setting('app.tenant_id', true) — the true second arg makes a missing GUC return NULL instead of erroring. NULL is the point: any row's tenantId compared to NULL returns NULL → no row matches. No tenant context = no rows visible.
FORCE ROW LEVEL SECURITY is set so the policy applies even to the table owner. Without FORCE, the role that Prisma uses (which IS the table owner during dev) would silently bypass.

This is enforced as DB-level invariant, not as application discipline.

How tenant context propagates

┌─────────────────────────────────────────────────────────────────┐
│  HTTP request arrives with better-auth session cookie           │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  SessionMiddleware                                              │
│  apps/api/src/identity/auth/session.middleware.ts                        │
│                                                                 │
│  • auth.api.getSession({ headers })                             │
│  • If a session exists:                                         │
│      req.user = {                                               │
│        id:        session.user.id,                              │
│        email:     session.user.email,                           │
│        businessId: session.session.activeOrganizationId,        │
│        role:      session.user.role,                            │
│      }                                                          │
│  • Non-enforcing — absence of a session leaves req.user empty.  │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  TenantContextMiddleware                                        │
│  apps/api/src/identity/tenant/tenant-context.middleware.ts               │
│                                                                 │
│  Reads tenantId from, in order:                                 │
│      req.user.businessId          (normal flow)                 │
│      req.user.tenantId            (forward-compat)              │
│      req.headers['x-tenant-id']   (service-to-service)          │
│  Then runs the request inside runWithTenant(tenantId, next)     │
│  which establishes an AsyncLocalStorage frame.                  │
│                                                                 │
│  No tenant resolved → request passes through with no context;   │
│  any handler that then calls getTenantId() fails closed.        │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  AuthGuard (default-deny APP_GUARD)                             │
│  apps/api/src/identity/auth/auth.guard.ts                                │
│                                                                 │
│  Unless the route is @Public(), returns 401 if req.user.id      │
│  is missing. Health / metrics / root opt out.                   │
└──────────────────────────┬──────────────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  Domain controller → use case → PrismaTransactionRunner         │
│  apps/api/src/_infra/shared-infra/prisma-transaction-runner.ts         │
│                                                                 │
│  At the start of every transaction, before any query runs:      │
│      SELECT set_config('app.tenant_id', ${tenantId}, true)      │
│                                                                 │
│  • `${tenantId}` comes from getTenantId() — the GUC value is    │
│    pinned to the same value AsyncLocalStorage carries.          │
│  • `true` (3rd arg) = LOCAL — auto-resets at COMMIT/ROLLBACK.   │
│  • Parameterised — no SQL injection.                            │
│  • The same `tx` handle is threaded to the domain repo, the     │
│    outbox writer, and the audit emitter so a state change + its │
│    event + its audit row commit atomically (ADR-008).           │
└─────────────────────────────────────────────────────────────────┘

Service-to-service / Go ledger

The Go ledger service runs its own WithTenantTx (in services/ledger/internal/pg/pool.go) which does the same SELECT set_config('app.tenant_id', $1, true) before any query inside a transaction. tenantId arrives as a proto field on the RPC; there is no GUC inheritance between the API monolith and the ledger service. Each side enforces its own tenant scope.

activeOrganizationId — the canonical mapping

The tenant of a request is the active organization of the better-auth session. We deliberately keep:

Organization.id == Business.id   (FK enforced at the DB)

so that:

Session.activeOrganizationId == req.user.businessId == app.tenant_id at every layer — no transformation, no lookup.
The Business bootstrap (commit #3 of better-auth integration) creates the matching Organization row in the same $transaction.
A user switching organizations via better-auth's /api/auth/organization/set-active instantly changes the tenant scope of every subsequent request.

Tables intentionally bypassing RLS

Not every table should be tenant-scoped. The auth/identity tier (better-auth tables) is INTENTIONALLY excluded:

Table	Why excluded from RLS
`User`	Identity is cross-tenant by design. A user can belong to multiple Organizations (the org plugin's `Member` table records the relationships). Putting `User` under RLS would break login itself — at login time there is no session yet, so no tenant.
`Session`	Looked up by `token` at the start of every authenticated request, before any tenant context exists. The session is what establishes the tenant; it cannot itself be tenant-scoped.
`Account`	One row per credential. Joined from User. Same reasoning.
`Verification`	Short-lived OTP / email-verify / pwd-reset codes. The identifier (email or phone) is addressable before a User even exists; cannot be tenant-scoped.
`TwoFactor`	One per User. Identity-tier.
`Organization`	Has the same id as Business but lives in the identity tier so the org plugin can list a user's orgs at session time (before any tenant is active).
`Member`	The (User, Organization, role) mapping. Reading this is how we resolve "which orgs is this user allowed into?" — by definition the read must span the user's tenants.
`Invitation`	Addressed by email before the invitee exists as a User.
`Business` and the legacy `Employee`/`Payroll`/etc. tables	Not yet under RLS. These predate the financial-data RLS rollout and are scoped today by application-level `WHERE businessId = ?`. They will be brought under RLS as part of their domain extraction (deferred per `restructure-2026-05.md` Phases 5–7). Until then, treat them as application-scoped and review queries carefully.

The financial-data tier (ewa_request, loan, outbox_event, idempotency_record, audit_entry) IS under RLS today.

Adding a new tenant table? Add it to both tenant_tables arrays in 20260526030000_apply_tenant_rls/migration.sql — the policy install AND the verify block. The verify will raise loudly if the policy is listed but not in place.

BYPASSRLS — the one explicit exception

The default Prisma role used by the API must NOT have BYPASSRLS. With FORCE ROW LEVEL SECURITY set, this is the only thing keeping the API honest.

Some workloads legitimately need cross-tenant access:

The outbox publisher drains outbox_event for all tenants — per definition not scoped to one.
Future reconciliation jobs read across tenants for finance reports.
Future support tooling may need read-only cross-tenant access.

These workloads must each use a separate role with BYPASSRLS and narrowly-scoped grants:

demozpay_outbox_publisher — LOGIN BYPASSRLS, granted only SELECT, UPDATE on outbox_event. Provisioned by infra/sql/00_create_outbox_publisher_role.sql (idempotent; verifies BYPASSRLS actually got set; operators run once per database). Used via OUTBOX_DATABASE_URL.

The outbox publisher service prefers OUTBOX_DATABASE_URL when set. With tenant RLS active and the publisher running under DATABASE_URL's regular role, the publisher would silently drain zero rows — RLS hides every row. The publisher logs a loud WARN at boot in that case so the misconfiguration is visible (apps/api/src/ outbox/outbox-publisher.service.ts does this explicitly).

Rules for new BYPASSRLS roles:

ONE role per workload. Never reuse the publisher's role for reconciliation or support — each gets its own.
Minimum grants. No DDL. Only the specific tables and operations the workload needs.
Separate secret from DATABASE_URL. Treat the BYPASSRLS credential as more privileged than the regular DB credential.
No HTTP request handlers may ever use it. It's reserved for background jobs and operator tooling.
Document in this ADR when a new BYPASSRLS role is added.

Operational risks and how we mitigate them

Risk: a developer forgets to thread the transaction runner

Symptom: a domain use case opens its own prisma.$transaction directly without going through PrismaTransactionRunner. The GUC never gets set; the query sees no rows; the developer thinks "nothing matches" and adds a workaround.

Mitigation: domain code may only see Store ports (shared/database's TransactionRunner<object>); the concrete PrismaTransactionRunner is injected at the adapter boundary. Lint rule blocks @prisma/client import in domain/application layers (ADR-003).

Risk: a deploy applies the schema migration but not the policy

Cannot happen in this codebase: the policy migration 20260526030000_apply_tenant_rls is part of the same Prisma migration sequence and its verify DO block raises if any expected table is missing RLS / FORCE / policy. The deploy fails before serving any traffic.

Risk: migration ordering — a new tenant table created but not added to RLS

Symptom: the table is created by an earlier migration. The RLS migration ran before, so its verify guard passes against the old list and the new table goes live with NO isolation. High risk — this is the classic incident.

Mitigation: every PR that creates a new financial-tier table MUST include a follow-on migration that adds the table to both tenant_tables arrays in 20260526030000_apply_tenant_rls (or re-runs an equivalent guard with the expanded list). Code review gate: any new tenantId column is a flag to require it. ADR-013 specifically calls this out as a process risk; lint coverage for this is a Planned improvement.

Risk: local dev sets `BYPASSRLS` to make development "easier"

Symptom: a developer runs ALTER ROLE ... BYPASSRLS locally to unblock a query. Code works in dev, fails silently in prod (where the role doesn't have BYPASSRLS).

Mitigation: do not document this workaround anywhere. The right fix is always to set the tenant via the runner. If you genuinely need cross-tenant local exploration, use the same OUTBOX_DATABASE_URL pattern with a temporary BYPASSRLS role you create and drop.

Risk: Prisma transaction caveats

Prisma's $transaction opens a single Postgres transaction. Each LOCAL set_config(..., true) is scoped to that transaction; it resets on COMMIT or ROLLBACK. However:

Inside Prisma's $transaction([batch]) array form (parallel statements), the GUC IS available — they all run on the same underlying transaction.
Outside any transaction (raw prisma.query not under a runner), the GUC is NOT set. Such queries will return zero rows from RLS tables. This is intentional — it's the fail-closed property.

Risk: SQL injection into the GUC value

Cannot happen: set_config('app.tenant_id', ${tenantId}, true) is called via Prisma's tagged-template tx.$executeRaw\...` which parameterises the value. The same applies on the Go side (tx.Exec(ctx, "SELECT set_config('app.tenant_id', $1, true)", tenantID)`).

Risk: a background job runs without tenant context

Symptom: a cron-style job calls into a domain service. No request, no SessionMiddleware, no TenantContextMiddleware → no tenant. The job's queries return zero rows.

Mitigation: background jobs must explicitly call runWithTenant({ tenantId }, () => ...) for the tenant they operate on. Multi-tenant jobs (e.g. nightly reconciliation across all employers) must loop and re-enter the tenant context for each one. The publisher is the explicit exception via BYPASSRLS.

Consequences

Positive

Tenant isolation is a database invariant, not an application discipline. Any future query path inherits it.
Deploy-time safety: an ALTER that drops RLS or removes a policy can't ship — the next migration's verify guard raises.
Single mental model: app.tenant_id = req.user.businessId = Session.activeOrganizationId = Organization.id = Business.id. No transformation layer.
Audit-friendly: a regulator asking "show me how you guarantee Employer A cannot see Employer B's data" gets a single answer with testable evidence (migration + policy text + verify block + the runtime check we ran during the May 2026 verification).

Negative

Local dev complexity: forgetting to set the GUC returns zero rows, which is initially confusing for new engineers. The runner hides this for normal code paths but raw queries can be surprising.
Cross-tenant tooling cost: every reconciliation or admin tool needs its own BYPASSRLS role. Worth the cost; reduces accidental grant leakage.
Legacy tables are inconsistent: Business, Employee, Payroll, Wallet etc. are NOT under RLS today. Application-side scoping is the only protection until their domain extractions ship.

Follow-ups

Bring legacy financial tables under RLS as their domain extractions land (Wallet, Payroll, BNPLPurchase, etc.).
Add a CI check that any new column named tenantId causes the PR to be flagged for "did you add it to the RLS verify list?".
Add docs/runbooks/verify-rls.md — one-page operator runbook for "is tenant isolation working right now?" with the exact psql queries that prove it on a live database. (Planned.)
Consider extending RLS to the better-auth Organization/Member tables once we have multi-org orchestration tooling that can run outside RLS via a dedicated role.

References

Implementation: apps/api/prisma/migrations/20260526030000_apply_tenant_rls/migration.sql
API-side: apps/api/src/identity/auth/session.middleware.ts, apps/api/src/identity/tenant/tenant-context.middleware.ts, apps/api/src/_infra/shared-infra/prisma-transaction-runner.ts
Ledger-side: services/ledger/internal/pg/pool.go (WithTenantTx)
BYPASSRLS role provisioning: infra/sql/00_create_outbox_publisher_role.sql
Outbox publisher: apps/api/src/_infra/outbox/outbox-publisher.service.ts
Implementation detail reference (slimmed): apps/api/prisma/RLS.md — kept as a short pointer back to this ADR.
Related: ADR-001 (modular monolith), ADR-008 (audit + outbox in same txn), ADR-009 (no DELETE), ADR-011 (cross-domain events only), ADR-012 (ledger accounting model).

Context​

Decision​

Alternatives considered​

The fail-closed contract​

How tenant context propagates​

Service-to-service / Go ledger​

activeOrganizationId — the canonical mapping​

Tables intentionally bypassing RLS​

BYPASSRLS — the one explicit exception​

Operational risks and how we mitigate them​

Risk: a developer forgets to thread the transaction runner​

Risk: a deploy applies the schema migration but not the policy​

Risk: migration ordering — a new tenant table created but not added to RLS​

Risk: local dev sets BYPASSRLS to make development "easier"​

Risk: Prisma transaction caveats​

Risk: SQL injection into the GUC value​

Risk: a background job runs without tenant context​

Consequences​

Positive​

Negative​

Follow-ups​

References​