DemozPay — Reconciliation Architecture
Snapshot: 2026-05-29 Companion to:
BANK_ORCHESTRATION.md,PRODUCTION_READINESS.md,docs/runbooks/drift-detected.md,docs/runbooks/bank-statement-parse-failed.md.
Why this document exists
Reconciliation is the single most important system at DemozPay. The platform's entire value proposition rests on this claim: "DemozPay's ledger and the partner bank's statement agree to the santim, every day." If that claim is false, every other claim is moot — every disbursement might be a lie, every balance shown to an employer might be wrong, every audit and regulator report rests on rotten ground.
Reconciliation is the only mechanism by which we discover that we're lying to ourselves. Treat it accordingly.
§1. The two truths
| Truth | Source | What it knows | What it doesn't |
|---|---|---|---|
| Money truth | Partner bank statement | What actually moved at the bank. Final, authoritative, signed by the bank. | Why it moved. Who initiated it. Whether it matches our ledger. |
| Obligation truth | DemozPay ledger | What we think moved. Why, who, when. | Whether the bank actually executed it. |
The bank wins ties. Always. Anyone proposing a code change premised on the ledger being authoritative over the bank statement is proposing custody — see BANK_ORCHESTRATION.md §11.
§2. Reconciliation lifecycle (the canonical loop)
┌──────────────────┐ ┌──────────────────┐
│ Partner bank │ │ DemozPay │
│ Settles transfers│ │ Receives webhook│
│ (T+0 to T+1) │ │ or polls status │
└──────────────────┘ └──────────────────┘
│ │
│ ① Bank produces │ ② Ledger flips
│ daily statement │ PENDING → POSTED
│ (CSV/MT940/JSON) │ on webhook/poll
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ ③ Ingestion (Dashen CSV parser, etc.) │
│ - file lands in S3 / partner-SFTP / partner-API │
│ - parse → bank_statement_line rows │
│ - idempotent on (tenant, partner, partner_reference) │
│ - flag per-row errors with reason │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ④ Matching │
│ - for each unmatched bank_statement_line: │
│ find disbursement WHERE │
│ amount = line.amount AND │
│ partner_reference = line.partner_reference AND │
│ value_date ± 1 day │
│ - matched → mark RECONCILED │
│ - unmatched → flag with reason │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ⑤ Ledger-side drift detection │
│ - per (account_id, period): │
│ call Ledger.ReconcileWithBank(period, statement_total) │
│ - returns (ledger_total, statement_total, signed_drift) │
│ - drift = 0 → all good │
│ - drift > 0 → ledger heavier than bank │
│ - drift < 0 → bank heavier than ledger │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ⑥ Operator workflow │
│ - drift = 0 → metric, archive, sleep until tomorrow │
│ - drift ≠ 0 → page on-call, drift-detected.md runbook │
│ - flagged_lines > threshold → finance/ops investigation │
│ - run forward-motion corrections through normal use cases │
│ (never UPDATE ledger directly) │
└─────────────────────────────────────────────────────────────────┘
│
▼
loop daily, per (tenant, partner)
Status of each step today:
| Step | Status | Notes |
|---|---|---|
| ① Bank produces statement | LIVE (partner side) | Dashen sandbox; production same shape. |
| ② Ledger flips on webhook/poll | LIVE (S3) | verify-s3.sh proves it. |
| ③ Ingestion | LIVE (S4.1) | Dashen CSV. Statement-pull (fetch the file from partner) is PLANNED. Currently a human or script feeds the file. |
| ④ Matching | LIVE (S4.1) | 24h skew. |
| ⑤ Drift detection RPC | LIVE (S4.3) | Ledger.ReconcileWithBank. |
| ⑥ Operator workflow | PLANNED | No cadence; no dashboard; no paging; no admin UI to inspect flagged lines. |
The reconciliation primitive is real. The reconciliation system is not.
§3. Reconciliation cadence — what runs when, who watches
This is the operational spec that does not yet exist in code or in infra. Define this BEFORE go-live.
| Window | Job | Trigger | SLA |
|---|---|---|---|
| Continuous | Webhook + poller (S3) | Per-event | Settlement reflected in ledger within 60s of bank webhook. |
| Hourly | Stale-pending sweep | Cron @ :05 | EWA + loan rows in SUBMITTED_TO_BANK or ACCEPTED_BY_BANK > 6h → escalate. (PLANNED) |
| Daily T+1 06:00 UTC | Statement-pull | Cron after Dashen's daily-cut at ~T+1 05:00 EAT | Pull yesterday's statement from partner; ingest. (PLANNED) |
| Daily T+1 07:00 UTC | Matching pass | After statement-pull | All unmatched lines flagged. (PLANNED cadence; primitive LIVE.) |
| Daily T+1 07:30 UTC | ReconcileWithBank per account | After match | Drift report per (tenant, account, day). PLANNED cadence. |
| Daily T+1 08:00 UTC | Drift summary email/Slack to finance/ops | After reconcile | If drift = 0 for all accounts → green emoji. Else → paging + runbook link. (PLANNED) |
| Weekly Monday 09:00 UTC | Trend report — drift / flagged-line rate / unmatched-rate over last 7 days | Cron | Goes to engineering + finance leads. (PLANNED) |
| Monthly 1st 09:00 UTC | Statement-aggregate report per partner — total volume, total drift, total flagged, dispute count | Cron | Sent to partner-bank ops; basis for monthly partner sync. (PLANNED) |
Reading this table is the reality check. Every "PLANNED" is an operational gap that the runbooks point at but the code does not yet fill.
§4. The reconciliation source-of-truth map
Where does each piece of state live? Who owns each table?
| Data | Table | Owner | Tenant-scoped? |
|---|---|---|---|
| Bank statement lines (raw) | bank_statement_line (gateway DB) | integration-gateway | YES — (tenant_id, partner, partner_reference) PK |
| Disbursement state | disbursement (gateway DB) | integration-gateway | YES |
| Disbursement event log | bank_event (gateway DB) | integration-gateway | YES |
| Ledger transactions | ledger_transaction (ledger DB) | ledger | YES |
| Ledger entries | ledger_entry (ledger DB) | ledger | YES |
| EWA aggregate | ewa_request (API DB) | API | YES |
| Loan aggregate | loan (API DB) | API | YES |
| Loan repayment records | loan_repayment (API DB) | API | YES |
| Outbox events | outbox_event (API DB) | API + outbox publisher BYPASSRLS | YES (publisher reads cross-tenant) |
| Idempotency records | idempotency_record (API DB) | API | YES |
| Audit trail | audit_entry (API DB) | API | YES |
Three databases. API has its own; ledger has its own; gateway has its own. This is correct: blast-radius isolation. Reconciliation queries cross databases via gRPC + RPC, never via cross-DB joins.
Consequence: there is no SQL query that joins ledger_entry to bank_statement_line directly. The matcher in services/integration-gateway/internal/reconciliation/ finds candidates via the disbursement table (which holds the link to ledger via ledger_transaction_id). When debugging a drift, the operator runs queries against three databases, not one.
§5. Drift — what the signed value means
Ledger.ReconcileWithBank returns drift = ledger_total - statement_total.
| Sign | Meaning | Likely cause |
|---|---|---|
drift = 0 | Agreement. | Healthy. |
drift > 0 | Ledger heavier than bank. We booked something the bank didn't move. | (a) Errant ConfirmSettlement — webhook/poll said COMPLETED but bank rolled back; (b) ledger sign-error bug — wrong account-type debit/credit; (c) period-boundary off-by-day (resolves next run). |
drift < 0 | Bank heavier than ledger. Bank moved money we didn't book. | (a) Failed ConfirmSettlement — we lost the webhook AND the poll; (b) someone moved money from our partner account outside the platform — security incident; (c) bank fee or adjustment we don't model. |
Runbook: docs/runbooks/drift-detected.md. Worth re-reading before opening any rail.
§6. Late-arriving settlements and the period boundary
A transfer initiated at 23:55 on day N may settle at 00:05 on day N+1. The bank statement for day N will not contain it; the statement for day N+1 will.
Today's behaviour: the matcher runs with a 24h skew on value_date. A settlement that crosses the day boundary still matches the right disbursement on the right day's statement, because the disbursement's created_at is within 24h of the line's value_date.
Edge case: if we reconcile day N at 06:00 of day N+1 BEFORE day N+1's statement arrives, a late settlement appears as drift > 0 for day N (we booked it; bank statement for day N doesn't have it; bank statement for day N+1 hasn't arrived). Resolution: do not finalize day N's drift until day N+1's statement has been ingested + matched. Practical rule: all "day N reconciliation final" checks should run no earlier than T+2 06:00.
Status: PLANNED — the cadence does not exist yet, so this rule is not enforced.
§7. Replay safety — re-ingesting a statement
The reconciliation store enforces (tenant_id, partner, partner_reference) ON CONFLICT idempotency. A statement re-ingested with a partial-fix ingester will:
- Skip lines that were inserted on the first pass.
- Insert lines that were rejected on the first pass and accepted now.
- NOT update the per-line
matched_disbursement_idfield if the row was already there — that requires a separate matcher pass.
Consequence: after fixing a parser bug, two steps are required:
- Re-ingest the statement file.
- Re-run the matcher with
--include-already-flagged(PLANNED — flag does not yet exist).
Status: primitive LIVE; operator-side workflow PLANNED.
§8. Reconciliation dashboard requirements
A reconciliation dashboard does not exist. When it lands (Grafana, internal admin UI — either works), it MUST show:
| Panel | Source | Why |
|---|---|---|
| Drift signed-value per (tenant, account, day) — last 30 days | ReconcileWithBank results | Trend visibility. A sudden jump is the alarm. |
| Flagged-line count per (tenant, partner, day) | bank_statement_line WHERE flagged=true | Match-rate health. |
| Unmatched-disbursement count per (tenant, partner, day) | gateway disbursement | Conversely: which of ours never landed. |
| Stale-pending count (EWA + loan in SUBMITTED_TO_BANK / ACCEPTED_BY_BANK > 6h) | API DB | Webhooks-not-arriving canary. |
| Webhook signature-rejection rate per partner | metric demozpay_bank_webhook_requests_total | Attack canary / key-rotation canary. |
| Settlement-poller tick health (success vs error count per hour) | metric demozpay_settlement_poller_ticks_total | Poller is alive. |
| Daily reconciliation status badge per (tenant, partner) — green/yellow/red | composite | At-a-glance "did yesterday close clean?" |
| Drift volume + drift signed-sum per partner, last 7 days | composite | Per-partner pattern view. |
| Statement-ingest health (last successful pull timestamp per partner) | new metric (PLANNED) | If today's statement didn't ingest, recon is blind. |
These panels are operational — not engineering vanity. Without them, drift = 0 cannot be claimed because no one is watching.
§9. SLA + SLO catalog
Define the reconciliation SLOs before opening a real rail. Recommended starting values:
| Target | SLO | Error budget | Page when |
|---|---|---|---|
| Webhook → ledger POSTED | 99% within 60s | 1% of webhooks may exceed 60s per month | If 5% of webhooks exceed 60s in a 5-min window |
| Settlement (any source — webhook OR poll) → ledger POSTED | 99.9% within 4h of bank-side settlement | 0.1% of settlements may exceed 4h per month | If any settlement exceeds 24h |
| Daily statement ingest success | 99.5% — at least one successful ingest per partner per day | 36h grace per month | If no successful ingest for any partner > 36h |
| Daily drift = 0 per (tenant, account) | 99.9% of (tenant, account, day) tuples have drift = 0 | 1 in 1000 may have non-zero drift | Any non-zero drift IMMEDIATELY (release-blocking during soak) |
| Flagged-line rate | < 1% of statement lines per partner per day | Daily SLO | If > 5% on any day |
These numbers are starting points. Tune after a quarter of real data. Don't open a rail without them.
§10. The human workflows that must exist
Engineering tends to overlook how much reconciliation is a people process. These are the workflows that need defined ownership:
| Workflow | Owner | Trigger | Today's status |
|---|---|---|---|
| Daily drift triage | Finance/ops on-call (when staffed) | Drift > 0 alert | NO RUNBOOK FOR ROUTINE PROCESS — the runbook covers incidents, not routine triage. |
| Unmatched-disbursement review | Finance/ops | Daily summary | NONE |
| Unmatched-statement-line review (a bank statement line we can't find a disbursement for) | Finance/ops + security | Daily summary | NONE |
| Partner dispute intake (employee says "money didn't arrive") | Customer support → integration team | Customer ticket | NONE — no support tool. |
| Reversal authorisation | Finance lead + engineering lead | When a manual reverse is needed | NONE — no admin UI; raw Reverse RPC. |
| Statement-fetch recovery (partner SFTP down) | Operations | Daily ingest fails | NONE |
| Bank key rotation | Engineering + partner contact | Quarterly | NONE — no rotation script. |
| Sanctions screening exception review | Compliance | Per flagged transfer | N/A (sanctions screening is PLANNED). |
Reading this list is the discomfort. Every workflow above will appear in the first week of pilot. Most do not have owners, runbooks, or tools.
§11. The reconciliation runbooks — what's written and what's missing
Written (in docs/runbooks/):
drift-detected.md— incident response for non-zero drift. Routine triage is not covered.bank-statement-parse-failed.md— when ingestion fails.webhook-failure.md— when signature verification fails sustained.gateway-down.md— when integration-gateway is unavailable.
Missing (must write before pilot):
reconciliation-daily-process.md— the SOP for the routine daily triage, screenshot of the dashboard, criteria for green-light vs further investigation.flagged-line-review.md— operator workflow for a flagged statement line: how to identify why it didn't match, how to escalate, how to close.unmatched-disbursement-review.md— conversely.dispute-intake.md— customer says money didn't arrive; how do we investigate end-to-end across three databases.reversal-procedure.md— when and how to authorise aReverse(), who signs off, what audit notes to attach.statement-pull-failure.md— partner statement didn't arrive; what to do.key-rotation.md— quarterly HMAC key rotation for each partner.partner-incident-notification.md— when to call which partner contact; what to send them.
§12. Reconciliation honesty test
If you read this document and think "we're ready":
- Pick any production date 30 days ago.
- For that date, produce: the bank statement file, the matched-disbursement count, the unmatched count, the signed drift per account, the flagged-line list.
- If you can't produce these in 10 minutes from a dashboard, reconciliation is not Live. The primitive is real; the system is not.
A 7-day drift-clean soak (S4.6) is a code-and-primitive proof. It is not evidence of operational readiness. Operational readiness is the workflows in §10 + the dashboard in §8 + the runbooks in §11.
§13. Cross-references
- Incident runbooks →
docs/runbooks/. - Money flows that feed reconciliation →
MONEY_FLOWS.md. - Production-readiness scoring →
PRODUCTION_READINESS.md. - Go-live gate (S4.6 etc.) →
GO_LIVE_BLOCKERS.md.