Skip to main content

ADR-026: Typed event contracts + Kafka consumer rules

Context

DemozPay already emits ~110 domain events through the transactional outbox, but they are stringly-typed with unknown payloads and consumed only by in-process Postgres pollers. As the platform grows to many products and many subscribers (Wallet, Savings, BNPL, Risk, Analytics, multiple banks), un-versioned anonymous payloads become a silent-failure surface: a producer changing a field breaks money consumers with no compile-time or CI signal. We are also committing to Kafka as the event backbone (amends ADR-017's "dormant" stance), which only becomes safe with governed contracts + explicit consumer guarantees.

Decision

1. Every published event is a typed, versioned contract.

  • One registry (@demoz-pay/shared-events EventRegistry) declares each event type → payload shape; appendDomainEvent() is the typed publish path. An event cannot be published unless declared. No anonymous JSON.
  • Naming demoz.<context>.<aggregate>.<event>.v<major>; one producer owns each type.
  • Every event carries an envelope: eventId, occurredAt, tenantId, schemaVersion, correlationId, causationId?, source.
  • The registry is the source for protobuf generation in packages/contracts; CI enforces BACKWARD compatibility (ADR-020/022).
  • Money fields are base-10 santim strings.

2. The transactional outbox is the only publish path. Business code never calls a Kafka producer directly (no dual-write); it writes the event in the same DB tx as the state change (ADR-008); a relay ships outbox → Kafka.

3. Consumer contract (mandatory): at-least-once delivery + idempotent consumers (dedup on eventId) = effective-once; per-consumer DLQ with bounded retry + alert + replay; per-aggregate ordering via partition key (cross-aggregate order never assumed); producers never reference consumers.

3a. Dedup is Postgres, not Redis. The eventId dedup is a processed_event(consumer, eventId) row written in the same tx as the consumer's state change (insert-or-skip on the PK) — transactional with the work, restart-safe, zero new infra. We deliberately do not introduce Redis (or any external store) for event dedup/DLQ. This is scoped to the event path, not a platform ban — Redis remains the cache/ephemeral tier (ADR-027). Revisit only with a measured throughput reason and a follow-up ADR.

Alternatives considered

  • Keep stringly-typed events — rejected: undetectable drift across a multi-service money platform.
  • Publish straight to Kafka from services — rejected: dual-write loses the atomicity the outbox guarantees.
  • Exactly-once semantics — rejected as cross-service myth; idempotent consumers are the real guarantee.

Consequences

  • Positive: payload drift caught at compile time + CI; new services subscribe without touching producers; correlation/causation make a whole flow traceable; safe Kafka activation.
  • Negative / accepted: every consumer must be written idempotent (a real discipline cost for money flows); migrating ~110 existing string events to the registry is phased work; an envelope must be threaded everywhere.
  • Follow-ups: seed the registry with the core money events; generate protobuf; stand up the relay; migrate one flow (payroll→repayment) as the proof (see MIGRATION_PLAN.md).

Revisit when

  • A third language needs to consume events (registry must generate its stubs).
  • Throughput needs partition/topic changes (revisit keys + partition counts).