Bitemporal Data for Auditability — Practical Playbook
Date: 2026-02-23
Category: software / knowledge
Why this matters
In finance-like systems, one timeline is rarely enough.
- Valid time: when a fact is true in the real world.
- System time: when our system learned/stored that fact.
If you only keep “latest state,” you lose the ability to answer:
- “What did we believe at decision time?”
- “What changed later due to correction/backfill?”
- “Was this decision wrong, or was data unavailable then?”
Bitemporal modeling solves that by storing both timelines explicitly.
Core model
Each record gets:
valid_from,valid_to(business reality interval)recorded_from,recorded_to(system knowledge interval)is_currentconvenience flag (optional)
Rule of thumb:
- New correction for past reality? Insert a new row, close prior
recorded_to. - Never rewrite history destructively.
Query patterns you actually need
1) “As we know now” (latest truth)
Filter by active recorded interval and active valid interval at query timestamp.
2) “What we knew then” (decision replay)
Pin recorded timestamp to decision time. This is the replay view for post-mortem.
3) “Correction delta” (drift tracking)
Compare result sets between two recorded timestamps for same valid period.
This gives a measurable “data revision risk” metric.
Minimal Postgres table sketch
create table instrument_facts (
id bigserial primary key,
instrument text not null,
value numeric not null,
valid_from timestamptz not null,
valid_to timestamptz not null default 'infinity',
recorded_from timestamptz not null default now(),
recorded_to timestamptz not null default 'infinity',
check (valid_from < valid_to),
check (recorded_from < recorded_to)
);
create index ix_instrument_facts_lookup
on instrument_facts (instrument, valid_from, valid_to, recorded_from, recorded_to);
Insert correction pattern:
- close previous open
recorded_to=now()for overlapping business interval - insert corrected row with new
recorded_from=now()
Operational guardrails
- No hard update policy on temporal rows except controlled closure of open intervals.
- Clock discipline: all timestamps in UTC, ingress normalized.
- Idempotent ingest keys for vendor replay/backfill jobs.
- Replay tests in CI:
- given decision timestamp T
- replay query at recorded=T
- verify same output snapshot hash
- Storage lifecycle:
- hot (90d), warm (1y), cold archive (object storage)
- keep queryable manifests for forensic fetch.
Practical use cases
- Slippage/TCA attribution fairness (“bad model vs late correction”).
- Compliance/audit trails with explainable chronology.
- Feature-store training reproducibility (no hindsight leakage).
- Incident forensics when vendor data revises after the fact.
Anti-patterns
- Keeping only
updated_atand calling it history. - Overwriting rows on correction (“we fixed it”) and losing epistemic trail.
- Using local timezone timestamps in ingestion pipelines.
- Backfilling without
recorded_*semantics.
30-minute adoption checklist
- Pick one high-value table (signals, prices, fundamentals, fills).
- Add valid/recorded interval columns.
- Implement close-then-insert correction path.
- Add replay query helper:
as_known_at(ts). - Add one CI replay test from historical decision timestamp.
If a system influences money or risk, bitemporal isn’t overengineering. It’s memory with timestamps that actually match reality.