Skip to main content

ADR-029 — DemozAuth: a first-class, framework-agnostic auth platform

Status: Accepted — v1.0 hardening landed: Better Auth removed from the runtime; security controls wired + enforced; refresh-rotation, runtime-enforced plugin guards + fail-fast boot validation, and the identity-kind seam shipped. See "Phase 6 — v1.0 hardening" at the end for the current state. Date: 2026-06-25 Renumbered: was drafted as ADR-018; renumbered to 029 to avoid collision with ADR-018 (progressive extraction). Supersedes/relates: ADR-013 (tenant isolation), the employee phone-OTP auth, Better Auth integration.

Context

Authentication is currently split across two systems: Better Auth (org-admins: email/password, TOTP, organization plugin, sessions) and a bespoke employee phone-OTP flow that issues its own Session rows. A security review surfaced real issues (OTP coupled to an org before auth, no rate limiting on /api/employee-auth/*, a duplicate-identity race, no PIN, localStorage-only onboarding) and a strategic need: authentication should become a reusable platform across all current and future Demoz products (pay, lending, HR, merchant/partner portals, mobile), not a DemozPay-specific module.

We want to learn from Better Auth's plugin/hook/schema-extension model without forking or copying it, and reach a state where replacing the auth provider means changing one adapter.

Decision

  1. Build @demoz-pay/demoz-auth — a pure, framework-agnostic package (no Nest/Prisma/Express/HTTP/React). It owns: domain (Identity, Credential, Session, Membership, AuthEvent), the ports contract, the engine (compose adapters + plugins + config), the plugin system + hook bus, and SDK flow contracts. Concrete adapters and transport bindings live in apps/api/src/identity.
  2. Identity-first model: identity → memberships → workspace → permissions. Two session kinds — IDENTITY (no workspace) and WORKSPACE (identity + tenant + permissions). Auth never assumes an organization.
  3. Credential-agnostic core: PIN/Password/OTP/Passkey/TOTP share one shape; new factors require no core change. PIN hashing uses Argon2id (@node-rs/argon2) via the Hasher port.
  4. Better Auth becomes one adapter behind the ExternalSessionResolver port (BetterAuthBridgeAdapter). It remains the authority for email/password + TOTP used by platform admin, business/employer, FI and merchant users. Employees use phone → OTP → PIN, owned by DemozAuth.
  5. Plugins contribute endpoints/guards/schema/hooks/services as transport-agnostic descriptors; the core never imports a plugin.
  6. Migration is incremental, additive, reversible, feature-flagged. Phases: (0) ADR + scaffold → (1) ports/engine/registry/bridge [this ADR] → (2) move phone/OTP/PIN/identity+workspace sessions into plugins behind flags → (3) audit/rate-limit/risk/device plugins → (4) Better Auth limited to email/password+TOTP+OAuth via the bridge → (5) optional Better Auth removal once at parity (a business decision, not a rewrite).

Persona → method (locked)

  • Admin / Business / FI / Merchant → email + password (+ TOTP) via Better Auth (unchanged).
  • Employee → phone → OTP → PIN (OTP only for first login / new device / PIN reset / risk).

Consequences

  • Positive: reusable company asset; provider lock-in removed (swap = one adapter); clean DDD boundaries; each capability independently testable; fintech security (Argon2id, hashed OTP, lockouts, risk hooks) designed in from day one.
  • Negative / risk: a second auth surface during migration (mitigated by the existing IdentityProvider seam and a single Session table with additive columns); schema-composition from plugin-declared models needs a build step (start simple, automate later); scope discipline required (OAuth/SSO/passkey are future plugins, not now).
  • Non-goals (now): rewriting org-admin auth, OAuth/SSO, passkeys.

Phase 1

Pure package scaffold: domain, ports, engine, plugin registry, hook bus, config, SDK contracts, in-memory testing adapters, unit tests; the ADR; and a non-wired BetterAuthBridgeAdapter. No runtime wiring, no migrations, no behavior change. Verified by tsc + eslint + package unit tests.

Phase 2a (this change) — core auth flows + RBAC

Still pure (no framework, no Prisma, no HTTP). Adds the actual authentication logic over the Phase-1 ports:

  • Domain: pin-policy.ts (validatePin — length, all-same, sequential and common-PIN rejection, throws WeakSecretError); rbac.ts (RbacRegistry — roles → permissions with inheritance, * superuser wildcard, can/canAll).
  • Use-cases (application/use-cases/): startPhoneAuth + verifyOtp (rate-limited, hashed OTP, max-attempts + single-use, mints an IDENTITY session, requiresPinSetup when no PIN credential); setPin + pinLogin (Argon2id-hashed PIN, exponential lockout, optional risk-engine OTP step-up); listWorkspaces + selectWorkspace (membership check → WORKSPACE session); issueSession/resolveActiveSession (opaque token, only the digest is stored — Digest port, distinct from the Hasher port; expiry/revocation/idle checks); emitEvent (publishes to the EventPublisher and the HookBus atomically).
  • Ports: added Digest (deterministic SHA-256-class hash for high-entropy tokens) alongside Hasher (salted Argon2id for low-entropy secrets). SmsSender and Digest added to AuthAdapters.
  • Errors: StepUpRequiredError, UnauthenticatedError, WorkspaceAccessDeniedError.

No runtime wiring, no migrations, no behavior change. Verified by tsc (lib + spec), eslint, 42 package unit tests (PIN policy, RBAC, and the full phone→OTP→PIN→workspace flow incl. lockout/expiry/step-up), apps/api typecheck, and nx build demoz-auth.

Phase 2b — flows wrapped as plugins

Each flow is now a transport-agnostic plugin (src/plugins/builtin/): phone-auth, pin, session, workspace. Plugins gained a handlers map (handlerId → AuthHandler) and an AuthRequest/AuthResult envelope, so a transport realizes routes without importing use-cases. PluginRegistry.resolvedEndpoints() binds each EndpointSpec to its handler and throws at boot if a handlerId is unbound. AuthResult.issueToken/clearToken keep cookie mechanics in the transport and token semantics in the package. Verified by a transport-level integration spec driving the full flow through resolvedEndpoints() (now 46 package tests).

Phase 3a — stateless concrete adapters (this change)

Concrete, framework-bound implementations of the capability ports, in apps/api/src/identity/demoz-auth/adapters/no DB, no routes, not yet wired, so entirely additive:

  • Argon2idHasher (@node-rs/argon2, Argon2id, m=19 MiB/t=2/p=1) — for PIN/password.
  • Sha256Digest (node:crypto) — for session tokens.
  • NodeRandomSource (crypto.randomBytes/randomInt, bias-free) — tokens + codes.
  • SystemClock. Verified by running them at the library boundary (argon2 round-trip incl. salt/garbage handling, deterministic digest, URL-safe unique tokens, uniform codes) + apps/api typecheck.

Phase 3b — Nest transport (this change)

A single catch-all DemozAuthController (apps/api/src/identity/demoz-auth/) mounts the engine's resolvedEndpoints() at /api/demoz-auth/*. It owns no auth logic: it extracts the session token (Authorization Bearer → IDENTITY cookie → WORKSPACE cookie), builds an AuthRequest, runs the handler, and realizes issueToken/clearToken as httpOnly cookies; error-http.ts maps the typed error code → HTTP status (400/401/403/404/423/429). The engine is built from the Phase-3a capability adapters + process-memory placeholder stores (dev/in-memory-stores.ts) so the flow is exercisable before the DB lands.

Feature-flagged, default OFF. DEMOZ_AUTH_TRANSPORT_ENABLED (config + ENV) gates DemozAuthModule.forRoot(): when false it registers no controller and no providers, so the AppModule import is inert and the live employee / Better-Auth flows are untouched. Verified by booting apps/api: flag OFF → app starts, no DemozAuth routes, no DI errors; flag ON → controller mounts, and the full flow + error paths pass over real HTTP (start → verify → set-PIN → PIN-login → session → workspaces → logout; weak-PIN 400, wrong-OTP 401, lockout 423 after 5 attempts, unknown route 404). Persistence is the only thing still placeholder.

Phase 3c — Prisma persistence over the EXISTING Better Auth tables (this change)

Decision change: DemozAuth reuses the current Better Auth tables (extend with nullable columns only — no Demoz* tables) and will grow to cover all personas, so Better Auth can be removed with existing credentials + sessions continuing to work. Full audit + mapping: BETTER_AUTH_INTEGRATION.md.

  • Additive migration 20260625120000_demoz_auth_existing_table_extensions: Account.failedAttempts/lockedUntil, Session.kind/tokenHash(unique)/lastSeenAt, Verification.attempts/consumedAt. All nullable, no backfill, no drops — Better Auth ignores them.
  • Prisma adapters (apps/api/src/identity/demoz-auth/adapters/): Identity→User, Credential→Account (providerId 'credential'=password, 'pin'=Argon2id), Session→Session (digest in tokenHash; token=digest to satisfy unique-not-null; revoke=delete), Otp→Verification (demoz:otp:{id} namespace), Membership→Member (+MemberRole.roleName[]). Engine factory swaps to these five bindings + real SMS (SMS_SENDER). Argon2id Hasher for PIN/OTP; ScryptPasswordHasher (byte-compatible with Better Auth's scrypt, cross-verified) is ready for the email/password persona.
  • Credential continuity proven: Better Auth uses scrypt ({saltHex}:{keyHex}, N=16384/r=16/p=1/dkLen=64); our hasher verifies BA hashes and BA verifies ours.
  • Verified end-to-end against real Postgres (flag ON, :8099): start→verify→set-PIN→ PIN-login→session→workspaces→logout, with rows confirmed in User/Account/Session/Verification (digest-at-rest, single-use OTP) and PIN lockout persisting failedAttempts=5/lockedUntil. Flag OFF default → inert. Frontends NOT yet cut over (paused here).

Phase 3d — audit-trail integration (this change)

AuditEventPublisher replaces the structured-logging placeholder and maps DemozAuth's events onto the platform's hash-chained AuthEventSink (authEventEntry): session.identity.issuedLOGIN_SUCCESS, otp.failed/pin.failedLOGIN_FAILURE, pin.lockedLOGIN_FAILURE/BLOCKED, session.revokedLOGOUT; the rest (identity.created, otp.requested/verified, pin.set/verified, workspace.selected, stepup.*) are structured-logged. Rows carry actorUserId+tenantId+metadata.demozEvent. Best-effort: an audit-write failure is logged, never thrown (the state change already committed; failing would lock a legitimately-authenticated user out). Bound to a PrismaAuthEventSink provided locally (only needs @Global PrismaService — no AuthModule import). Verified: a live flow produced LOGIN_SUCCESS → LOGIN_FAILURE → LOGIN_SUCCESS → LOGOUT rows with the hash chain intact (each prevHash = prior hash). Limitation: ip/userAgent need per-request engine context (Phase 4).

Phase 4a — email/password + workspace login (backend, this change)

DemozAuth can now authenticate the admin/business/FI/merchant personas:

  • password.ts use-case (passwordLogin / setPassword) + email-password plugin (POST /password/login public, POST /password session-gated).
  • A new optional passwordHasher adapter slot (scrypt for PASSWORD, distinct from the Argon2id hasher for PIN/OTP); bound to ScryptPasswordHasher in apps/api. PIN and password share ONE lockout core (secret-login.ts, loginWithSecret) so they can't drift.
  • Audit map extended: password.failed→LOGIN_FAILURE, password.locked→BLOCKED.
  • Verified live: seeded a user whose password hash was produced by Better Auth's own scrypt util, then logged in through POST /api/demoz-auth/password/login — session issued, LOGIN_SUCCESS audited; wrong/unknown/missing → 401/400; LOGIN_FAILURE/password_invalid audited. Existing Better Auth credentials work unchanged. 50 package tests green. No frontend change.

Phase 4b — session bridge + employer-web pilot (this change)

The keystone: protected /api/* routes authenticate via SessionMiddlewareIdentityProvider. better-auth.identity-provider.ts now also resolves DemozAuth sessions — reading the demoz.workspace_session/demoz.identity_session cookie, SHA-256-digesting it, and looking it up by Session.tokenHash (the same table). Additive + self-gating (only when the flag is on AND a demoz cookie is present). So a DemozAuth login works for the entire existing protected API surface across all portals — verified live: the real seeded Habesha owner (a Better-Auth credential) logged in via /password/login/workspaces/selectGET /api/members/me returned the member + permissions (tenant scoping resolved through the bridge).

employer-web pilot, flag-gated by NEXT_PUBLIC_DEMOZ_AUTH (default OFF → Better Auth, untouched):

  • Shared browser client createDemozAuthClient + isDemozAuthEnabled in @demoz-pay/frontend-auth (reusable by every portal).
  • Login page, SessionProvider (split into BA + Demoz variants), logout, and middleware all branch on the flag. Core path: login → list/select workspace → session poll → logout.
  • Also tagged frontend-auth scope:shared (it was untagged — fixed a pre-existing module-boundary lint error for all its importers).
  • nx build employer-web green; default-OFF path unchanged.

Known gap (flag ON): secondary flows still on Better Auth — 2FA, password reset, change-password, profile update, session list/revoke — would 401 under a DemozAuth session. The pilot flag validates the core login; those features must move to DemozAuth (or BA must accept DemozAuth sessions) before the flag can default ON for employer or before BA is removed.

Phase 4c — secondary flows (in progress)

Closing the Better-Auth-only gaps so the employer pilot is self-sufficient. Done (backend, verified live):

  • changePassword use-case + POST /password/change (verifies current → sets new).
  • Session management: listSessions / revokeSession / revokeOtherSessions
    • GET /sessions, POST /sessions/revoke, POST /sessions/revoke-others.
  • Audit: password.changedPASSWORD_RESET_COMPLETED. 54 package tests green; verified on the real owner against :8000 (list/revoke-others/change-password).

Also done this round:

  • Per-request ip/userAgent threaded transport → login use-cases → issueSession (AuthRequest.ip/userAgent set server-side from X-Forwarded-For/socket + UA header; RequestMeta carried by verifyOtp/pinLogin/passwordLogin). DemozAuth sessions now store device info — verified live (/sessions shows the forwarded IP
    • UA on the current row).
  • employer-web Settings wired (flag-gated): UserProfileCard change-password + session list/revoke/revoke-others now call demozAuthClient when the flag is on (BA otherwise). nx build employer-web green.

Password reset (this round) — done + verified live:

  • New TokenStore port (single-use, token-keyed, purpose-scoped; reusable for magic-link/email-verify later) + EmailSender.sendPasswordReset. requestPasswordReset (anti-enumeration, best-effort email) + resetPassword (verify token → set password → revoke ALL sessions → single-use). Endpoints POST /password/forgot + /password/reset (public). Prisma TokenStore over the Verification table; RealEmailSenderEMAIL_SENDER; reset URL = ${EMPLOYER_WEB_URL}/reset-password (config). employer-web forgot/reset pages flag-gated. Verified live: forgot (existing/unknown both ok) → emailed token → reset → new password logs in → token single-use (reuse 401). 56 package tests green.

2FA / TOTP (this round) — done + verified live:

  • Ports TotpService + TwoFactorStore (plaintext to the core; encryption is an adapter concern). Use-cases: startTotpEnrollment / confirmTotpEnrollment (returns one-time backup codes) / disableTotp; login gate — passwordLogin returns { requiresTwoFactor, twoFactorChallenge } (no session) when an active factor exists; verifyTwoFactorLogin accepts a TOTP or a one-time backup code, then mints the session. Config twoFactor + DEMOZ_AUTH_SECRET.
  • Adapters: NodeTotpService (RFC 6238, SHA-1/6-digit/30s, ±1 window), PrismaTwoFactorStore over the TwoFactor table with AES-256-GCM secret + backup-code encryption (key scrypt-derived from DEMOZ_AUTH_SECRET ?? BETTER_AUTH_SECRET) and User.twoFactorEnabled kept in sync. two-factor plugin (enroll / enroll/confirm / disable / login). Audit: mfa.enroll/disable, MFA_CHALLENGE_SUCCESS/FAILURE.
  • employer-web wired (flag-gated): login → 2fa-verify challenge; 2fa-enroll (QR + backup codes); disable in Settings. Verified live end-to-end with real TOTP codes: enroll → confirm → gated login → wrong-code 401 → TOTP ok → backup-code login → disable → ungated. 59 package tests green.

Profile update (this round) — done: PATCH /api/me/account (name/avatar) on the existing MeController — cross-persona, resolves the User from the session (BA, employee, or DemozAuth via the bridge), so it's not in the auth package. employer-web profile save flag-gated to it. Verified live under a DemozAuth session.

Phase 4c is complete.

Phase 4d — all portals piloted (this change)

admin-web, fi-web, merchant-web each got the employer treatment (flag-gated on NEXT_PUBLIC_DEMOZ_AUTH): lib/demoz-auth-client.ts, login (+2FA challenge), split SessionProvider (BA + Demoz variants with the right persona check — FI_PARTNER/MERCHANT, and organization===null for platform admin), middleware cookies, forgot/reset/2fa-verify/2fa-enroll pages, dropdown logout, and Settings (change-password/sessions/2FA/profile). fi-web + merchant-web verified live (login → workspace → /members/me returns the correct kind). All four portals build. Caveat: admin-web enforces 2FA and existing Better-Auth TOTP secrets aren't readable by DemozAuth (different cipher) — PrismaTwoFactorStore.find treats them as unenrolled rather than throwing, so admin-web must stay on Better Auth until a TOTP-secret migration ships (or admins re-enroll). fi/merchant/employer (2FA optional) are unaffected.

The remaining work (per product direction, these must exist in DemozAuth BEFORE BA is removed): OAuth/social sign-in, magic-link, and JWT issuance as DemozAuth plugins (introduces DemozAuth's own secret/baseURL config), a TOTP-secret migration for admin, then flip each portal's flag on and delete the BA factory + Express mount. Tables + credentials remain.

Phase 5 — Better Auth removal at parity

Removal requires feature parity, not just login parity. Per product direction (2026-06-25), DemozAuth must FIRST gain the capabilities BA provides that we keep using — OAuth/social sign-in, magic-link, and JWT issuance (plus 2FA + password reset from 4c). These become DemozAuth plugins (the engine's plugin model is built for exactly this). Only when DemozAuth covers every BA feature in use AND all four portals are cut over is BA deleted (one factory + the Express mount). Adding OAuth/magic-link/JWT is when DemozAuth gains its own secret/baseURL config (signed tokens + callback URLs) — not needed before.

Phase 6 — v1.0 hardening (landed)

DemozAuth is now the sole runtime auth path; Better Auth is gone from the running code (the seed provisions users via auth.api.createUser). The User/Account/Session/Member/Verification tables are owned by DemozAuth — their Better-Auth lineage is migration history, hidden behind the ports. What shipped in this pass:

  • Security wiring (Phase 1): the RateLimiter (Redis + in-memory fallback) and RiskEngine ports are bound; idle-timeout + revocation are enforced on every request (the IdentityProvider resolves through resolveActiveSession, not a bare lookup); CSRF/Origin enforcement on cookie-authenticated state-changing requests; credential metadata and session deviceId now persist (migration 20260627140000).
  • Session completeness (Phase 2): long-lived httpOnly refresh cookie at login; POST /session/refresh rotates access + refresh single-use with reuse-detection (a replayed old refresh OR the pre-rotation access token is rejected); POST /session/logout-all; flag-gated new-device step-up.
  • Plugin system framework-grade (Phase 3): plugin guards are runtime-enforced (the transport runs them before the handler; the identity-session guard resolves the session once and stashes it on the request); plugins emit their own events via ctx.emit; engine.init() fail-fast validates the graph — duplicate plugin id, duplicate METHOD path, missing handler, unresolved/duplicate guard all throw at boot with a clear message. Guide: packages/demoz-auth/docs/PLUGIN_AUTHORING.md.
  • Identity-kind seam (Phase 4): User.kind (migration 20260627150000) — USER (human, default) | SERVICE_ACCOUNT | API_CLIENT. Non-human kinds may be created with neither phone nor email (the store synthesizes a namespaced placeholder for the required email column). Seam only — issuance flows (API keys, scopes) are the federation roadmap (Phase 11).
  • Docs (Phase 5): packages/demoz-auth/docs/SECURITY.md, docs/PLUGIN_AUTHORING.md, refreshed README. Future OSS extraction → @demoz-auth/core
    • @demoz-auth/adapter-* (rename + split; no code move needed). Not done yet.

Admin-web caveat unchanged: Better-Auth TOTP secrets aren't readable by DemozAuth (different cipher), so admin 2FA enrolment carries over only by re-enrolment or a TOTP-secret migration.

Phase 7 — Federation (Phases 6–11 of the roadmap, landed)

All built as plugins/ports on the v1.0 core; the protocol crypto that can't live in a pure package sits behind a verifier port (an app adapter), exactly like the existing OAuth providers.

  • Service accounts + API keys (11): ApiKeyStore + ApiKey table (migration 20260627160000); dzk_<id>.<secret> keys, only the secret's digest stored; an api-key guard (Authorization: Bearer dzk_…) is the programmatic-auth path. /api-keys (issue/list/revoke) + /api-keys/whoami. Live-verified end to end.
  • OAuth provider registry (7): a generic oidcProvider({...}) makes any OIDC IdP config-not-code; GitHub + Microsoft added beside Google. The transport already dispatches /oauth/<id>/* by id — zero per-provider transport edits. Live-verified (github/microsoft authorize redirects).
  • Passkeys / WebAuthn (6): WebAuthnVerifier + PasskeyStore + WebAuthnChallengeStore ports; the crypto is an app adapter (e.g. @simplewebauthn). Register/login ceremonies + challenge replay defense + counter-regression check. Routes mounted; 501 NOT_CONFIGURED until a verifier is bound (live-verified gate). Round-trip tested with a stub verifier.
  • OIDC issuance (8): discovery doc + id_token (over the JWT signer) + UserInfo. Live-verified (/.well-known/openid-configuration, id_token with sub/aud/nonce/iss, userinfo). A full third-party IdP (authorization-code flow, client registry, RS256+JWKS) is the next step — bind an RS256 signer; claims/discovery stay the same.
  • SAML SSO (9): SamlVerifier port seam (XML-DSIG is the app adapter); AuthnRequest → ACS → provision/link identity (honoring disableSignUp) → session. Routes mounted; 501 NOT_CONFIGURED until a verifier is bound (live-verified). Stub-tested.
  • SCIM 2.0 (10): /scim/v2/Users List/Create/Get/Replace/Patch/Delete over the IdentityStore, api-key-guarded, SCIM error schema. Required + delivered a transport enhancement: :param path routing (literal routes still shadow param routes) and correct 204 No-Content responses. Live-verified full CRUD lifecycle.

Package test suite: 114 passing. The package stays product-agnostic (zero demozpay/payroll/employee in src/).