Skip to main content

09 — Common mistakes (and how to debug them)

A bestiary of pitfalls a new contributor will hit. Bookmark this page; come back when something doesn't work.

"My query returns zero rows even though there's data"

Most likely cause: you're outside a PrismaTransactionRunner call, so app.tenant_id is not set, so RLS hides every row.

Diagnostic:

-- Inside psql:
SELECT current_setting('app.tenant_id', true);
-- If this returns NULL or empty string, RLS will hide all rows.

Fix:

// ❌:
const rows = await prisma.ewa_request.findMany({ ... });

// ✅:
await runner.runInTransaction(async (tx) => {
const rows = await tx.ewa_request.findMany({ ... });
});

Or, for local exploration via psql:

SET LOCAL app.tenant_id = 'your-business-id';
SELECT * FROM ewa_request;

"POST /api/ewa/requests returns 500 with a session cookie"

Most likely cause: the user has no Organization membership, so req.user.businessId is undefined, so the tenant context is empty, so the use case fails when it tries to set the GUC.

Diagnostic:

# In your local DB:
psql "$DATABASE_URL" -c '
SELECT u.id, u.email, s."activeOrganizationId"
FROM "User" u
LEFT JOIN "Session" s ON s."userId" = u.id;
'

If activeOrganizationId is NULL, that's the issue.

Fix (today, manual): create a Business → it creates an Organization with the same id. Then insert a Member row linking your User to the Organization. Then call:

curl -X POST http://localhost:8000/api/auth/organization/set-active \
-H "Cookie: $COOKIE" \
-H 'content-type: application/json' \
-d '{"organizationId": "$BUSINESS_ID"}'

Fix (eventual): Member auto-creation on first sign-up is Planned. Until then, the manual dance is required.

"Boot fails with 'DATABASE_URL is required'"

Cause: no .env file, or the variable is empty.

Fix: cp .env.example .env and fill it in. See 02-running-locally.md §"Step 3".

"Boot fails with 'JWT_SECRET must be at least 16 characters'"

Cause: the Zod schema enforces a minimum length to prevent accidentally shipping a 4-character dev secret.

Fix:

openssl rand -base64 32

Paste into .env for both JWT_SECRET and BETTER_AUTH_SECRET.

"Postgres isn't reachable"

Symptoms: boot fails with postgres unreachable, or /api/readyz shows postgres down.

Diagnostics:

docker compose ps # is the container up?
docker compose logs postgres | tail -50 # any errors?
psql "$DATABASE_URL" -c 'SELECT 1;' # can YOU reach it?

Common fixes:

  • pnpm docker:up if not running.
  • Port conflict on 5432 → change the host port in docker-compose.yml and update DATABASE_URL.
  • Stale data → pnpm docker:down && pnpm docker:up.

"Migration fails: relation already exists"

Cause: half-applied migration state.

Fix (dev only):

pnpm prisma:reset # wipes and re-applies

Never prisma:reset against a non-local DB. There's no soft delete; all data is gone.

"Migration fails: 'tenant_isolation policy missing on X'"

Cause: the RLS verify guard at the end of 20260526030000_apply_tenant_rls/migration.sql raised because either:

  • A new tenant table was added but not registered in the tenant_tables array, OR
  • The migration partially applied and was retried.

Diagnostic:

SELECT relname, relrowsecurity, relforcerowsecurity
FROM pg_class
WHERE relkind = 'r'
AND relname IN ('ewa_request', 'loan', 'outbox_event',
'idempotency_record', 'audit_entry');

Fix: add the missing table to BOTH tenant_tables arrays in the migration. See 08-how-to-add-things.md §"Add a new tenant table".

"The outbox publisher logs say 'drained 0 rows' but I see rows"

Cause: the publisher is running under the API role, which is subject to RLS. With no tenant context, it sees zero rows.

Diagnostic:

# Did you set OUTBOX_DATABASE_URL?
grep OUTBOX_DATABASE_URL .env

If not, the publisher uses DATABASE_URL (the API role) and logs a loud WARN at boot.

Fix:

  1. Provision the BYPASSRLS role:
    psql "$DATABASE_URL" -v publisher_password="'devpass'" \
    -f infra/sql/00_create_outbox_publisher_role.sql
  2. Add to .env:
    OUTBOX_DATABASE_URL="postgresql://demozpay_outbox_publisher:devpass@localhost:5432/demoz_pay_dev?schema=public"
  3. Restart the API.

"TypeScript can't find @demoz-pay/ewa"

Cause: Nx project graph cache stale, or the path alias isn't registered.

Diagnostic:

pnpm nx graph # opens an interactive graph view
cat tsconfig.base.json | jq '.compilerOptions.paths'

Fix:

pnpm nx reset # clears Nx cache
pnpm install # re-link workspace deps

"ESLint error: 'cross-domain import not allowed'"

Cause: ADR-011 — domain packages can't import each other. You tried to import from @demoz-pay/lending inside @demoz-pay/ewa or vice versa.

Fix: communicate via outbox events instead. Read 07-rules-and-patterns.md §"Pattern C".

"I get a 401 on every route except /healthz"

Cause: this is correct behavior. AuthGuard is registered as APP_GUARD. Every route is protected unless decorated @Public().

Diagnostic: the route should not be public; you need to sign in.

Fix: sign up via /api/auth/sign-up/email and use the returned cookie. See 02-running-locally.md §"Step 7".

"I get a 500 on /api/auth/sign-up/email"

Most likely causes:

  1. DB not migrated — better-auth tables don't exist.
    • Fix: pnpm prisma:migrate.
  2. BETTER_AUTH_SECRET is too short.
    • Fix: rotate to a ≥16-char secret.
  3. Email/password conflict (user already exists).
    • Fix: use a different email or look up the user in DB.

"The phone-OTP send-otp endpoint succeeds but I never get a code"

Cause: SMS_PROVIDER=logger (the default) prints the OTP to the API log instead of sending an SMS. This is intentional in dev.

Fix: check the API log for a line like:

[DEV-OTP] phone=+251xxxxxxx code=123456

For production, you need an AfricasTalking adapter (Planned). SMS_PROVIDER=africastalking is reserved for that implementation.

"An audit row didn't get written"

Cause: the use case bypassed the runner, so the audit emitter wasn't given a tx handle, so it never inserted.

Diagnostic: search the use case for prisma.<table>.create() calls outside runner.runInTransaction(...).

Fix: wrap all state changes in runner.runInTransaction. The audit + outbox + state changes commit together or roll back together.

"Tests pass locally but CI fails"

Most likely causes:

  1. Hidden file-system case-sensitivity. macOS is case-insensitive; Linux is. Verify import paths match file casing exactly.
  2. Non-deterministic test order. Tests that depend on database state from a previous test will fail when test order changes.
  3. Time-zone differences. CI runs in UTC; your machine doesn't. Always use .toUTC() or compare with explicit timezone.

"I committed something I shouldn't have"

If not pushed yet:

git reset --soft HEAD~1 # uncommit but keep changes
# now you can edit / split / re-commit

If pushed but no one has pulled:

git revert HEAD # creates an "undo" commit
git push

If pushed and merged and people pulled:

  • Don't try to rewrite history.
  • Open a new PR that undoes the change.
  • Talk to the team.

"Where do I find logs from the API in production?"

Today: no production deployment exists. Logs go to stdout.

When production lands, logs will flow through Pino → JSON → log collector (TBD: Loki / Datadog / equivalent). Trace IDs from OpenTelemetry will be the join key between traces and logs.

"The Go ledger says 'connection refused'"

Cause: services/ledger isn't running.

Diagnostic:

# In services/ledger:
go run ./cmd/ledger # requires Go 1.22+ AND buf generate already run

Fix: the gRPC ledger probe in the API is advisory, not fatal — the API will boot and serve everything except money operations. If you need money operations, you need the ledger running.

Easier path for testing the schema + DB invariants without a Go toolchain:

./services/ledger/test/verify.sh

This brings up a hermetic Postgres on :5434, applies migrations, and runs the integration tests (which exercise the store layer directly — no Go server needed).

"I want to roll back a migration"

Don't. Forward-only migrations are the rule (informally; not yet an explicit ADR but implied by ADR-009).

The right approach when a migration is wrong:

  1. Write a new migration that compensates (drop_columnadd_column_back).
  2. If data was lost, restore from backup.

prisma migrate reset is dev-only.

"I'm reading code and I don't understand why something exists"

Order of investigation:

  1. Search the ADRs. grep -ri 'thing-i-dont-understand' docs/adr/
  2. Read the closest README. Most packages have one.
  3. Look at the test. Tests are documentation of expected behavior.
  4. Read the commit message. git log --follow --pretty=format:'%h %s' -- path/to/file
  5. Ask in the team channel. Don't suffer in silence.

If you find a piece of code that doesn't explain itself and has no ADR / no comment / no test, add a comment in the PR that introduces your change. The codebase improves through readers.

"I'm overwhelmed by the codebase"

That's the right reaction. The codebase looks like it does ~30% of what its docs claim (per the status matrix). Most of the heavy work is in the invariants, the tenant model, and the ledger schema — narrow surface, deep correctness.

Start small:

  1. Read apps/api/src/main.ts end-to-end. Just that one file.
  2. Trace one request through the layers (e.g. GET /api/healthz).
  3. Look at one use case (packages/ewa/backend/application/ request-ewa.usecase.ts).
  4. Look at the test for it.

Then come back to this onboarding suite and start 07-rules-and-patterns.md. You'll get further.

Continue reading

You've finished the suite. From here: