AUTH_RISK_MATRIX.md — DemozPay
Companion to
AUTH_SYSTEM_REVIEW.md. Scoring: Likelihood (L) and Impact (I) each 1–5; Risk = L × I. Priority bands: 🔴 ≥15 · 🟠 9–14 · 🟡 4–8 · 🟢 ≤3. "Stage to fix" maps to the 30/90/180 roadmap in the review.
A. Current-system risk register (what exists today)
| # | Risk | Class | L | I | Score | Band | Stage to fix |
|---|---|---|---|---|---|---|---|
| R1 | S2S gRPC createInsecure() — API↔ledger↔gateway unauthenticated + unencrypted (money tier) | Security / Compliance | 4 | 5 | 20 | 🔴 | 30d |
| R2 | Platform-admin: cross-tenant superuser, no MFA / step-up / separate realm | Security / Insider | 3 | 5 | 15 | 🔴 | 30–90d |
| R3 | No rate limiting / lockout on password + OTP (credential stuffing, OTP brute-force) | Security | 4 | 4 | 16 | 🔴 | 30d |
| R4 | Auth subtree bypasses Nest pipeline — no shared rate-limit/validation/trace on login/OTP/reset | Security / Operational | 4 | 4 | 16 | 🔴 | 30d |
| R5 | Phone OTP not production-ready (LoggingSmsSender drops codes in prod) — primary market path | Operational / Security | 4 | 3 | 12 | 🟠 | 30–90d |
| R6 | No immutable auth event log (logins, MFA, role grants, admin actions, impersonation) | Compliance / Audit | 3 | 4 | 12 | 🟠 | 90d |
| R7 | Webhook nonce signed but not enforced → replay within 5-min skew window | Security | 2 | 4 | 8 | 🟡 | 90d |
| R8 | MFA absent platform-wide (TwoFactor table, plugin not wired) | Security | 3 | 4 | 12 | 🟠 | 90d |
| R9 | Per-request getSession() + per-RBAC 2 DB queries, no cache | Scaling / Operational | 3 | 3 | 9 | 🟠 | 90–180d |
| R10 | Per-route opt-in RBAC — undecorated money route = any authenticated user | Security | 2 | 4 | 8 | 🟡 | 90d |
| R11 | Dual role model (Member.role vs legacy Role+permissions) | Maintenance / Tech-debt | 3 | 2 | 6 | 🟡 | 90–180d |
| R12 | Middleware coverage = manual 4-controller allow-list — new controllers silently uncovered | Maintenance / Operational | 3 | 3 | 9 | 🟠 | 90d |
| R13 | Schema/session-shape lock-in to Better Auth (no isolation abstraction yet) | Vendor / Maintenance | 3 | 3 | 9 | 🟠 | 30–90d (abstraction) |
| R14 | Opaque DB sessions don't federate across regions | Scaling | 2 | 3 | 6 | 🟡 | 180d+ (decision gate) |
| R15 | Weak password policy (min-8 only, no breach/complexity check) | Security | 2 | 2 | 4 | 🟡 | 90d |
| R16 | Better Auth version churn (young lib, plugin API breaking changes) | Ecosystem | 3 | 2 | 6 | 🟡 | Ongoing (regression suite) |
| R17 | Legacy shared/auth JWT primitives still shipped | Tech-debt | 2 | 1 | 2 | 🟢 | 180d |
Top 4 to fix first (the only 🔴s): R1, R3, R4, R2. None of them are "replace Better Auth". All four exist regardless of the library.
B. Option-comparison risk (the decision itself)
Risk introduced by choosing each path, independent of current gaps.
| Dimension | A — Keep + harden | B — Fork | C — Custom | D — Enterprise IAM |
|---|---|---|---|---|
| Security-regression risk | 🟢 Low | 🟡 Med | 🔴 High | 🟡 Med (migration) |
| Delivery risk (will it ship) | 🟢 Low | 🟠 Med-High | 🔴 Very high | 🟠 Med-High |
| Long-term maintenance | 🟡 Med | 🔴 High (own a fork forever) | 🔴 Very high | 🟢 Low (vendor/community) |
| Compliance/auditor story | 🟡 Med | 🟡 Med | 🔴 Weak ("rolled our own") | 🟢 Strong (attested IAM) |
| Lock-in | 🟡 Med (mitigated by abstraction) | 🟠 High (your fork) | 🟢 None (you own it) | 🟠 Vendor-dependent (Auth0 high, Ory/Keycloak low) |
| Cost to reach parity | 🟢 ~0 | 🟠 Med | 🔴 6–9 mo | 🟠 2–4 mo migration |
| Regulator perception | 🟡 Neutral | 🟡 Neutral | 🔴 Negative | 🟢 Positive |
| Partner-bank perception | 🟡 Neutral→Positive after hardening | 🟡 Neutral | 🔴 Negative | 🟢 Positive |
Lowest-risk path: A now → D at a triggered scale-up gate. B and C carry risk no current requirement justifies.
C. Attack-surface map (auth-relevant)
| Surface | Entry point | Current control | Residual risk | Ref |
|---|---|---|---|---|
| Credential login | POST /api/auth/sign-in | password hash (Better Auth), CSRF via trustedOrigins | No rate limit / lockout (R3); outside Nest pipeline (R4) | better-auth.factory.ts:42 |
| OTP login | POST /api/auth/phone-number/* | OTP 5-min expiry | No rate limit (R3); prod sender drops codes (R5) | sms-sender.ts |
| Password reset | Better Auth reset flow | token expiry (lib) | No rate limit; not exercised/verified here | factory |
| Session use | cookie → every /api/* | opaque DB token, fail-closed guard | DB hit/req (R9) | session.middleware.ts |
| Tenant scoping | req.user.businessId → ALS → RLS | RLS-forced financial tables, headers rejected | residual = undecorated route (R10) | tenant-context.middleware.ts |
| Privilege escalation | @RequireOrgRole / platform-admin | Member/AdminProfile checks, single role source | admin path = unconditional bypass, no MFA (R2) | org-role.guard.ts:94 |
| Partner webhook | POST /api/integration/bank-callback/:partner | HMAC + skew + timing-safe | replay in window (R7) | bank-webhook.controller.ts |
| Service-to-service | gRPC → ledger / gateway | none | 🔴 unauthenticated, cleartext (R1) | *.grpc-client.ts |
| Admin operations | platform-admin endpoints | RBAC bypass | no MFA, no impersonation audit (R2, R6) | org-role.guard.ts |
D. Migration risk matrix (if/when moving off Better Auth → see AUTH_MIGRATION_STRATEGY.md)
| Migration risk | L | I | Score | Mitigation |
|---|---|---|---|---|
| Session invalidation forcing mass re-login | 4 | 2 | 8 | Dual-read sessions during cutover; expire-don't-revoke |
| Password hash format incompatibility | 3 | 4 | 12 | Lazy rehash-on-login; keep Account.password readable |
| Org/member mapping drift | 2 | 4 | 8 | Keep Organization.id == Business.id invariant; reconciliation job |
| RBAC semantics change | 3 | 3 | 9 | Abstraction layer freezes role contract before migration |
| Tenant-context regression (RLS) | 2 | 5 | 10 | RLS stays at DB regardless of IDP — do not couple to migration |
| Audit-trail continuity break | 3 | 3 | 9 | Auth-event log lives outside IDP from day one |
| OIDC/federation misconfig | 3 | 4 | 12 | Adopt vetted OP; never hand-roll RP validation |
Principle: the abstraction layer (§6 of the review) is what converts every cell above from "rewrite risk" to "config + reconciliation risk." Build it before you need it.