ADR-029 — DemozAuth: a first-class, framework-agnostic auth platform
Status: Accepted — v1.0 hardening landed: Better Auth removed from the runtime; security controls wired + enforced; refresh-rotation, runtime-enforced plugin guards + fail-fast boot validation, and the identity-kind seam shipped. See "Phase 6 — v1.0 hardening" at the end for the current state. Date: 2026-06-25 Renumbered: was drafted as ADR-018; renumbered to 029 to avoid collision with ADR-018 (progressive extraction). Supersedes/relates: ADR-013 (tenant isolation), the employee phone-OTP auth, Better Auth integration.
Context
Authentication is currently split across two systems: Better Auth (org-admins:
email/password, TOTP, organization plugin, sessions) and a bespoke employee
phone-OTP flow that issues its own Session rows. A security review surfaced
real issues (OTP coupled to an org before auth, no rate limiting on
/api/employee-auth/*, a duplicate-identity race, no PIN, localStorage-only
onboarding) and a strategic need: authentication should become a reusable
platform across all current and future Demoz products (pay, lending, HR,
merchant/partner portals, mobile), not a DemozPay-specific module.
We want to learn from Better Auth's plugin/hook/schema-extension model without forking or copying it, and reach a state where replacing the auth provider means changing one adapter.
Decision
- Build
@demoz-pay/demoz-auth— a pure, framework-agnostic package (no Nest/Prisma/Express/HTTP/React). It owns: domain (Identity, Credential, Session, Membership, AuthEvent), the ports contract, the engine (compose adapters + plugins + config), the plugin system + hook bus, and SDK flow contracts. Concrete adapters and transport bindings live inapps/api/src/identity. - Identity-first model: identity → memberships → workspace → permissions.
Two session kinds —
IDENTITY(no workspace) andWORKSPACE(identity + tenant + permissions). Auth never assumes an organization. - Credential-agnostic core: PIN/Password/OTP/Passkey/TOTP share one shape;
new factors require no core change. PIN hashing uses Argon2id
(
@node-rs/argon2) via theHasherport. - Better Auth becomes one adapter behind the
ExternalSessionResolverport (BetterAuthBridgeAdapter). It remains the authority for email/password + TOTP used by platform admin, business/employer, FI and merchant users. Employees use phone → OTP → PIN, owned by DemozAuth. - Plugins contribute endpoints/guards/schema/hooks/services as transport-agnostic descriptors; the core never imports a plugin.
- Migration is incremental, additive, reversible, feature-flagged. Phases: (0) ADR + scaffold → (1) ports/engine/registry/bridge [this ADR] → (2) move phone/OTP/PIN/identity+workspace sessions into plugins behind flags → (3) audit/rate-limit/risk/device plugins → (4) Better Auth limited to email/password+TOTP+OAuth via the bridge → (5) optional Better Auth removal once at parity (a business decision, not a rewrite).
Persona → method (locked)
- Admin / Business / FI / Merchant → email + password (+ TOTP) via Better Auth (unchanged).
- Employee → phone → OTP → PIN (OTP only for first login / new device / PIN reset / risk).
Consequences
- Positive: reusable company asset; provider lock-in removed (swap = one adapter); clean DDD boundaries; each capability independently testable; fintech security (Argon2id, hashed OTP, lockouts, risk hooks) designed in from day one.
- Negative / risk: a second auth surface during migration (mitigated by the
existing
IdentityProviderseam and a singleSessiontable with additive columns); schema-composition from plugin-declared models needs a build step (start simple, automate later); scope discipline required (OAuth/SSO/passkey are future plugins, not now). - Non-goals (now): rewriting org-admin auth, OAuth/SSO, passkeys.
Phase 1
Pure package scaffold: domain, ports, engine, plugin registry, hook bus, config,
SDK contracts, in-memory testing adapters, unit tests; the ADR; and a non-wired
BetterAuthBridgeAdapter. No runtime wiring, no migrations, no behavior
change. Verified by tsc + eslint + package unit tests.
Phase 2a (this change) — core auth flows + RBAC
Still pure (no framework, no Prisma, no HTTP). Adds the actual authentication logic over the Phase-1 ports:
- Domain:
pin-policy.ts(validatePin— length, all-same, sequential and common-PIN rejection, throwsWeakSecretError);rbac.ts(RbacRegistry— roles → permissions with inheritance,*superuser wildcard,can/canAll). - Use-cases (
application/use-cases/):startPhoneAuth+verifyOtp(rate-limited, hashed OTP, max-attempts + single-use, mints anIDENTITYsession,requiresPinSetupwhen no PIN credential);setPin+pinLogin(Argon2id-hashed PIN, exponential lockout, optional risk-engine OTP step-up);listWorkspaces+selectWorkspace(membership check →WORKSPACEsession);issueSession/resolveActiveSession(opaque token, only the digest is stored —Digestport, distinct from theHasherport; expiry/revocation/idle checks);emitEvent(publishes to theEventPublisherand theHookBusatomically). - Ports: added
Digest(deterministic SHA-256-class hash for high-entropy tokens) alongsideHasher(salted Argon2id for low-entropy secrets).SmsSenderandDigestadded toAuthAdapters. - Errors:
StepUpRequiredError,UnauthenticatedError,WorkspaceAccessDeniedError.
No runtime wiring, no migrations, no behavior change. Verified by tsc
(lib + spec), eslint, 42 package unit tests (PIN policy, RBAC, and the full
phone→OTP→PIN→workspace flow incl. lockout/expiry/step-up), apps/api typecheck,
and nx build demoz-auth.
Phase 2b — flows wrapped as plugins
Each flow is now a transport-agnostic plugin (src/plugins/builtin/):
phone-auth, pin, session, workspace. Plugins gained a handlers map
(handlerId → AuthHandler) and an AuthRequest/AuthResult envelope, so a
transport realizes routes without importing use-cases. PluginRegistry.resolvedEndpoints()
binds each EndpointSpec to its handler and throws at boot if a handlerId is
unbound. AuthResult.issueToken/clearToken keep cookie mechanics in the
transport and token semantics in the package. Verified by a transport-level
integration spec driving the full flow through resolvedEndpoints() (now 46
package tests).
Phase 3a — stateless concrete adapters (this change)
Concrete, framework-bound implementations of the capability ports, in
apps/api/src/identity/demoz-auth/adapters/ — no DB, no routes, not yet
wired, so entirely additive:
Argon2idHasher(@node-rs/argon2, Argon2id, m=19 MiB/t=2/p=1) — for PIN/password.Sha256Digest(node:crypto) — for session tokens.NodeRandomSource(crypto.randomBytes/randomInt, bias-free) — tokens + codes.SystemClock. Verified by running them at the library boundary (argon2 round-trip incl. salt/garbage handling, deterministic digest, URL-safe unique tokens, uniform codes) +apps/apitypecheck.
Phase 3b — Nest transport (this change)
A single catch-all DemozAuthController (apps/api/src/identity/demoz-auth/)
mounts the engine's resolvedEndpoints() at /api/demoz-auth/*. It owns no auth
logic: it extracts the session token (Authorization Bearer → IDENTITY cookie →
WORKSPACE cookie), builds an AuthRequest, runs the handler, and realizes
issueToken/clearToken as httpOnly cookies; error-http.ts maps the typed
error code → HTTP status (400/401/403/404/423/429). The engine is built from
the Phase-3a capability adapters + process-memory placeholder stores
(dev/in-memory-stores.ts) so the flow is exercisable before the DB lands.
Feature-flagged, default OFF. DEMOZ_AUTH_TRANSPORT_ENABLED (config + ENV)
gates DemozAuthModule.forRoot(): when false it registers no controller and no
providers, so the AppModule import is inert and the live employee / Better-Auth
flows are untouched. Verified by booting apps/api: flag OFF → app starts,
no DemozAuth routes, no DI errors; flag ON → controller mounts, and the full
flow + error paths pass over real HTTP (start → verify → set-PIN → PIN-login →
session → workspaces → logout; weak-PIN 400, wrong-OTP 401, lockout 423 after 5
attempts, unknown route 404). Persistence is the only thing still placeholder.
Phase 3c — Prisma persistence over the EXISTING Better Auth tables (this change)
Decision change: DemozAuth reuses the current Better Auth tables (extend
with nullable columns only — no Demoz* tables) and will grow to cover all
personas, so Better Auth can be removed with existing credentials + sessions
continuing to work. Full audit + mapping: BETTER_AUTH_INTEGRATION.md.
- Additive migration
20260625120000_demoz_auth_existing_table_extensions:Account.failedAttempts/lockedUntil,Session.kind/tokenHash(unique)/lastSeenAt,Verification.attempts/consumedAt. All nullable, no backfill, no drops — Better Auth ignores them. - Prisma adapters (
apps/api/src/identity/demoz-auth/adapters/): Identity→User, Credential→Account (providerId 'credential'=password,'pin'=Argon2id), Session→Session (digest intokenHash;token=digest to satisfy unique-not-null; revoke=delete), Otp→Verification (demoz:otp:{id}namespace), Membership→Member (+MemberRole.roleName[]). Engine factory swaps to these five bindings + real SMS (SMS_SENDER). Argon2idHasherfor PIN/OTP;ScryptPasswordHasher(byte-compatible with Better Auth's scrypt, cross-verified) is ready for the email/password persona. - Credential continuity proven: Better Auth uses scrypt (
{saltHex}:{keyHex}, N=16384/r=16/p=1/dkLen=64); our hasher verifies BA hashes and BA verifies ours. - Verified end-to-end against real Postgres (flag ON,
:8099): start→verify→set-PIN→ PIN-login→session→workspaces→logout, with rows confirmed in User/Account/Session/Verification (digest-at-rest, single-use OTP) and PIN lockout persistingfailedAttempts=5/lockedUntil. Flag OFF default → inert. Frontends NOT yet cut over (paused here).
Phase 3d — audit-trail integration (this change)
AuditEventPublisher replaces the structured-logging placeholder and maps
DemozAuth's events onto the platform's hash-chained AuthEventSink
(authEventEntry): session.identity.issued→LOGIN_SUCCESS,
otp.failed/pin.failed→LOGIN_FAILURE, pin.locked→LOGIN_FAILURE/BLOCKED,
session.revoked→LOGOUT; the rest (identity.created, otp.requested/verified,
pin.set/verified, workspace.selected, stepup.*) are structured-logged. Rows carry
actorUserId+tenantId+metadata.demozEvent. Best-effort: an audit-write
failure is logged, never thrown (the state change already committed; failing would
lock a legitimately-authenticated user out). Bound to a PrismaAuthEventSink
provided locally (only needs @Global PrismaService — no AuthModule import).
Verified: a live flow produced LOGIN_SUCCESS → LOGIN_FAILURE → LOGIN_SUCCESS → LOGOUT rows with the hash chain intact (each prevHash = prior hash).
Limitation: ip/userAgent need per-request engine context (Phase 4).
Phase 4a — email/password + workspace login (backend, this change)
DemozAuth can now authenticate the admin/business/FI/merchant personas:
password.tsuse-case (passwordLogin/setPassword) +email-passwordplugin (POST /password/loginpublic,POST /passwordsession-gated).- A new optional
passwordHasheradapter slot (scrypt for PASSWORD, distinct from the Argon2idhasherfor PIN/OTP); bound toScryptPasswordHasherinapps/api. PIN and password share ONE lockout core (secret-login.ts,loginWithSecret) so they can't drift. - Audit map extended:
password.failed→LOGIN_FAILURE,password.locked→BLOCKED. - Verified live: seeded a user whose password hash was produced by Better
Auth's own scrypt util, then logged in through
POST /api/demoz-auth/password/login— session issued,LOGIN_SUCCESSaudited; wrong/unknown/missing → 401/400;LOGIN_FAILURE/password_invalidaudited. Existing Better Auth credentials work unchanged. 50 package tests green. No frontend change.
Phase 4b — session bridge + employer-web pilot (this change)
The keystone: protected /api/* routes authenticate via SessionMiddleware
→ IdentityProvider. better-auth.identity-provider.ts now also resolves
DemozAuth sessions — reading the demoz.workspace_session/demoz.identity_session
cookie, SHA-256-digesting it, and looking it up by Session.tokenHash (the same
table). Additive + self-gating (only when the flag is on AND a demoz cookie is
present). So a DemozAuth login works for the entire existing protected API
surface across all portals — verified live: the real seeded Habesha owner
(a Better-Auth credential) logged in via /password/login → /workspaces/select
→ GET /api/members/me returned the member + permissions (tenant scoping resolved
through the bridge).
employer-web pilot, flag-gated by NEXT_PUBLIC_DEMOZ_AUTH (default OFF →
Better Auth, untouched):
- Shared browser client
createDemozAuthClient+isDemozAuthEnabledin@demoz-pay/frontend-auth(reusable by every portal). - Login page,
SessionProvider(split into BA + Demoz variants), logout, and middleware all branch on the flag. Core path: login → list/select workspace → session poll → logout. - Also tagged
frontend-authscope:shared(it was untagged — fixed a pre-existing module-boundary lint error for all its importers). nx build employer-webgreen; default-OFF path unchanged.
Known gap (flag ON): secondary flows still on Better Auth — 2FA, password reset, change-password, profile update, session list/revoke — would 401 under a DemozAuth session. The pilot flag validates the core login; those features must move to DemozAuth (or BA must accept DemozAuth sessions) before the flag can default ON for employer or before BA is removed.
Phase 4c — secondary flows (in progress)
Closing the Better-Auth-only gaps so the employer pilot is self-sufficient. Done (backend, verified live):
changePassworduse-case +POST /password/change(verifies current → sets new).- Session management:
listSessions/revokeSession/revokeOtherSessionsGET /sessions,POST /sessions/revoke,POST /sessions/revoke-others.
- Audit:
password.changed→PASSWORD_RESET_COMPLETED. 54 package tests green; verified on the real owner against:8000(list/revoke-others/change-password).
Also done this round:
- Per-request ip/userAgent threaded transport → login use-cases →
issueSession(AuthRequest.ip/userAgentset server-side fromX-Forwarded-For/socket + UA header;RequestMetacarried byverifyOtp/pinLogin/passwordLogin). DemozAuth sessions now store device info — verified live (/sessionsshows the forwarded IP- UA on the current row).
- employer-web Settings wired (flag-gated):
UserProfileCardchange-password + session list/revoke/revoke-others now calldemozAuthClientwhen the flag is on (BA otherwise).nx build employer-webgreen.
Password reset (this round) — done + verified live:
- New
TokenStoreport (single-use, token-keyed,purpose-scoped; reusable for magic-link/email-verify later) +EmailSender.sendPasswordReset.requestPasswordReset(anti-enumeration, best-effort email) +resetPassword(verify token → set password → revoke ALL sessions → single-use). EndpointsPOST /password/forgot+/password/reset(public). PrismaTokenStoreover theVerificationtable;RealEmailSender→EMAIL_SENDER; reset URL =${EMPLOYER_WEB_URL}/reset-password(config). employer-web forgot/reset pages flag-gated. Verified live: forgot (existing/unknown both ok) → emailed token → reset → new password logs in → token single-use (reuse 401). 56 package tests green.
2FA / TOTP (this round) — done + verified live:
- Ports
TotpService+TwoFactorStore(plaintext to the core; encryption is an adapter concern). Use-cases:startTotpEnrollment/confirmTotpEnrollment(returns one-time backup codes) /disableTotp; login gate —passwordLoginreturns{ requiresTwoFactor, twoFactorChallenge }(no session) when an active factor exists;verifyTwoFactorLoginaccepts a TOTP or a one-time backup code, then mints the session. ConfigtwoFactor+DEMOZ_AUTH_SECRET. - Adapters:
NodeTotpService(RFC 6238, SHA-1/6-digit/30s, ±1 window),PrismaTwoFactorStoreover theTwoFactortable with AES-256-GCM secret + backup-code encryption (key scrypt-derived fromDEMOZ_AUTH_SECRET??BETTER_AUTH_SECRET) andUser.twoFactorEnabledkept in sync.two-factorplugin (enroll / enroll/confirm / disable / login). Audit:mfa.enroll/disable,MFA_CHALLENGE_SUCCESS/FAILURE. - employer-web wired (flag-gated): login → 2fa-verify challenge; 2fa-enroll (QR + backup codes); disable in Settings. Verified live end-to-end with real TOTP codes: enroll → confirm → gated login → wrong-code 401 → TOTP ok → backup-code login → disable → ungated. 59 package tests green.
Profile update (this round) — done: PATCH /api/me/account (name/avatar) on the
existing MeController — cross-persona, resolves the User from the session (BA,
employee, or DemozAuth via the bridge), so it's not in the auth package. employer-web
profile save flag-gated to it. Verified live under a DemozAuth session.
Phase 4c is complete.
Phase 4d — all portals piloted (this change)
admin-web, fi-web, merchant-web each got the employer treatment (flag-gated on
NEXT_PUBLIC_DEMOZ_AUTH): lib/demoz-auth-client.ts, login (+2FA challenge),
split SessionProvider (BA + Demoz variants with the right persona check —
FI_PARTNER/MERCHANT, and organization===null for platform admin), middleware
cookies, forgot/reset/2fa-verify/2fa-enroll pages, dropdown logout, and Settings
(change-password/sessions/2FA/profile). fi-web + merchant-web verified live
(login → workspace → /members/me returns the correct kind). All four portals build.
Caveat: admin-web enforces 2FA and existing Better-Auth TOTP secrets aren't
readable by DemozAuth (different cipher) — PrismaTwoFactorStore.find treats
them as unenrolled rather than throwing, so admin-web must stay on Better Auth
until a TOTP-secret migration ships (or admins re-enroll). fi/merchant/employer
(2FA optional) are unaffected.
Phase 5 — OAuth + magic-link + JWT, then remove Better Auth
The remaining work (per product direction, these must exist in DemozAuth BEFORE
BA is removed): OAuth/social sign-in, magic-link, and JWT issuance as DemozAuth
plugins (introduces DemozAuth's own secret/baseURL config), a TOTP-secret
migration for admin, then flip each portal's flag on and delete the BA factory +
Express mount. Tables + credentials remain.
Phase 5 — Better Auth removal at parity
Removal requires feature parity, not just login parity. Per product
direction (2026-06-25), DemozAuth must FIRST gain the capabilities BA provides
that we keep using — OAuth/social sign-in, magic-link, and JWT issuance (plus
2FA + password reset from 4c). These become DemozAuth plugins (the engine's
plugin model is built for exactly this). Only when DemozAuth covers every BA
feature in use AND all four portals are cut over is BA deleted (one factory + the
Express mount). Adding OAuth/magic-link/JWT is when DemozAuth gains its own
secret/baseURL config (signed tokens + callback URLs) — not needed before.
Phase 6 — v1.0 hardening (landed)
DemozAuth is now the sole runtime auth path; Better Auth is gone from the running
code (the seed provisions users via auth.api.createUser). The
User/Account/Session/Member/Verification tables are owned by DemozAuth
— their Better-Auth lineage is migration history, hidden behind the ports.
What shipped in this pass:
- Security wiring (Phase 1): the
RateLimiter(Redis + in-memory fallback) andRiskEngineports are bound; idle-timeout + revocation are enforced on every request (the IdentityProvider resolves throughresolveActiveSession, not a bare lookup); CSRF/Origin enforcement on cookie-authenticated state-changing requests; credentialmetadataand sessiondeviceIdnow persist (migration20260627140000). - Session completeness (Phase 2): long-lived httpOnly refresh cookie at login;
POST /session/refreshrotates access + refresh single-use with reuse-detection (a replayed old refresh OR the pre-rotation access token is rejected);POST /session/logout-all; flag-gated new-device step-up. - Plugin system framework-grade (Phase 3): plugin guards are runtime-enforced
(the transport runs them before the handler; the
identity-sessionguard resolves the session once and stashes it on the request); plugins emit their own events viactx.emit;engine.init()fail-fast validates the graph — duplicate plugin id, duplicateMETHOD path, missing handler, unresolved/duplicate guard all throw at boot with a clear message. Guide:packages/demoz-auth/docs/PLUGIN_AUTHORING.md. - Identity-kind seam (Phase 4):
User.kind(migration20260627150000) —USER(human, default) |SERVICE_ACCOUNT|API_CLIENT. Non-human kinds may be created with neither phone nor email (the store synthesizes a namespaced placeholder for the requiredemailcolumn). Seam only — issuance flows (API keys, scopes) are the federation roadmap (Phase 11). - Docs (Phase 5):
packages/demoz-auth/docs/SECURITY.md,docs/PLUGIN_AUTHORING.md, refreshed README. Future OSS extraction →@demoz-auth/core@demoz-auth/adapter-*(rename + split; no code move needed). Not done yet.
Admin-web caveat unchanged: Better-Auth TOTP secrets aren't readable by DemozAuth (different cipher), so admin 2FA enrolment carries over only by re-enrolment or a TOTP-secret migration.
Phase 7 — Federation (Phases 6–11 of the roadmap, landed)
All built as plugins/ports on the v1.0 core; the protocol crypto that can't live in a pure package sits behind a verifier port (an app adapter), exactly like the existing OAuth providers.
- Service accounts + API keys (11):
ApiKeyStore+ApiKeytable (migration20260627160000);dzk_<id>.<secret>keys, only the secret's digest stored; anapi-keyguard (Authorization: Bearer dzk_…) is the programmatic-auth path./api-keys(issue/list/revoke) +/api-keys/whoami. Live-verified end to end. - OAuth provider registry (7): a generic
oidcProvider({...})makes any OIDC IdP config-not-code; GitHub + Microsoft added beside Google. The transport already dispatches/oauth/<id>/*by id — zero per-provider transport edits. Live-verified (github/microsoft authorize redirects). - Passkeys / WebAuthn (6):
WebAuthnVerifier+PasskeyStore+WebAuthnChallengeStoreports; the crypto is an app adapter (e.g. @simplewebauthn). Register/login ceremonies + challenge replay defense + counter-regression check. Routes mounted; 501 NOT_CONFIGURED until a verifier is bound (live-verified gate). Round-trip tested with a stub verifier. - OIDC issuance (8): discovery doc +
id_token(over the JWT signer) + UserInfo. Live-verified (/.well-known/openid-configuration, id_token with sub/aud/nonce/iss, userinfo). A full third-party IdP (authorization-code flow, client registry, RS256+JWKS) is the next step — bind an RS256 signer; claims/discovery stay the same. - SAML SSO (9):
SamlVerifierport seam (XML-DSIG is the app adapter); AuthnRequest → ACS → provision/link identity (honoring disableSignUp) → session. Routes mounted; 501 NOT_CONFIGURED until a verifier is bound (live-verified). Stub-tested. - SCIM 2.0 (10):
/scim/v2/UsersList/Create/Get/Replace/Patch/Delete over the IdentityStore, api-key-guarded, SCIM error schema. Required + delivered a transport enhancement::parampath routing (literal routes still shadow param routes) and correct 204 No-Content responses. Live-verified full CRUD lifecycle.
Package test suite: 114 passing. The package stays product-agnostic (zero
demozpay/payroll/employee in src/).