ADR-026: Typed event contracts + Kafka consumer rules
- Status: Accepted
- Date: 2026-06-20
- Deciders: Principal Architect, Engineering Lead
- Relates to: ADR-008, ADR-011, ADR-017, ADR-020, ADR-022
Context
DemozPay already emits ~110 domain events through the transactional outbox, but they are stringly-typed with unknown payloads and consumed only by in-process Postgres pollers. As the platform grows to many products and many subscribers (Wallet, Savings, BNPL, Risk, Analytics, multiple banks), un-versioned anonymous payloads become a silent-failure surface: a producer changing a field breaks money consumers with no compile-time or CI signal. We are also committing to Kafka as the event backbone (amends ADR-017's "dormant" stance), which only becomes safe with governed contracts + explicit consumer guarantees.
Decision
1. Every published event is a typed, versioned contract.
- One registry (
@demoz-pay/shared-eventsEventRegistry) declares each event type → payload shape;appendDomainEvent()is the typed publish path. An event cannot be published unless declared. No anonymous JSON. - Naming
demoz.<context>.<aggregate>.<event>.v<major>; one producer owns each type. - Every event carries an envelope:
eventId,occurredAt,tenantId,schemaVersion,correlationId,causationId?,source. - The registry is the source for protobuf generation in
packages/contracts; CI enforcesBACKWARDcompatibility (ADR-020/022). - Money fields are base-10 santim strings.
2. The transactional outbox is the only publish path. Business code never calls a Kafka producer directly (no dual-write); it writes the event in the same DB tx as the state change (ADR-008); a relay ships outbox → Kafka.
3. Consumer contract (mandatory): at-least-once delivery + idempotent consumers (dedup on eventId) = effective-once; per-consumer DLQ with bounded retry + alert + replay; per-aggregate ordering via partition key (cross-aggregate order never assumed); producers never reference consumers.
3a. Dedup is Postgres, not Redis. The eventId dedup is a processed_event(consumer, eventId) row written in the same tx as the consumer's state change (insert-or-skip on the PK) — transactional with the work, restart-safe, zero new infra. We deliberately do not introduce Redis (or any external store) for event dedup/DLQ. This is scoped to the event path, not a platform ban — Redis remains the cache/ephemeral tier (ADR-027). Revisit only with a measured throughput reason and a follow-up ADR.
Alternatives considered
- Keep stringly-typed events — rejected: undetectable drift across a multi-service money platform.
- Publish straight to Kafka from services — rejected: dual-write loses the atomicity the outbox guarantees.
- Exactly-once semantics — rejected as cross-service myth; idempotent consumers are the real guarantee.
Consequences
- Positive: payload drift caught at compile time + CI; new services subscribe without touching producers; correlation/causation make a whole flow traceable; safe Kafka activation.
- Negative / accepted: every consumer must be written idempotent (a real discipline cost for money flows); migrating ~110 existing string events to the registry is phased work; an envelope must be threaded everywhere.
- Follow-ups: seed the registry with the core money events; generate protobuf; stand up the relay; migrate one flow (payroll→repayment) as the proof (see MIGRATION_PLAN.md).
Revisit when
- A third language needs to consume events (registry must generate its stubs).
- Throughput needs partition/topic changes (revisit keys + partition counts).