Bitemporal Data for Auditability — Practical Playbook

Date: 2026-02-23
Category: software / knowledge

Why this matters

In finance-like systems, one timeline is rarely enough.

Valid time: when a fact is true in the real world.
System time: when our system learned/stored that fact.

If you only keep “latest state,” you lose the ability to answer:

“What did we believe at decision time?”
“What changed later due to correction/backfill?”
“Was this decision wrong, or was data unavailable then?”

Bitemporal modeling solves that by storing both timelines explicitly.

Core model

Each record gets:

valid_from, valid_to (business reality interval)
recorded_from, recorded_to (system knowledge interval)
is_current convenience flag (optional)

Rule of thumb:

New correction for past reality? Insert a new row, close prior recorded_to.
Never rewrite history destructively.

Query patterns you actually need

1) “As we know now” (latest truth)

Filter by active recorded interval and active valid interval at query timestamp.

2) “What we knew then” (decision replay)

Pin recorded timestamp to decision time. This is the replay view for post-mortem.

3) “Correction delta” (drift tracking)

Compare result sets between two recorded timestamps for same valid period.

This gives a measurable “data revision risk” metric.

Minimal Postgres table sketch

create table instrument_facts (
  id bigserial primary key,
  instrument text not null,
  value numeric not null,

  valid_from timestamptz not null,
  valid_to   timestamptz not null default 'infinity',

  recorded_from timestamptz not null default now(),
  recorded_to   timestamptz not null default 'infinity',

  check (valid_from < valid_to),
  check (recorded_from < recorded_to)
);

create index ix_instrument_facts_lookup
  on instrument_facts (instrument, valid_from, valid_to, recorded_from, recorded_to);

Insert correction pattern:

close previous open recorded_to=now() for overlapping business interval
insert corrected row with new recorded_from=now()

Operational guardrails

No hard update policy on temporal rows except controlled closure of open intervals.
Clock discipline: all timestamps in UTC, ingress normalized.
Idempotent ingest keys for vendor replay/backfill jobs.
Replay tests in CI:
- given decision timestamp T
- replay query at recorded=T
- verify same output snapshot hash
Storage lifecycle:
- hot (90d), warm (1y), cold archive (object storage)
- keep queryable manifests for forensic fetch.

Practical use cases

Slippage/TCA attribution fairness (“bad model vs late correction”).
Compliance/audit trails with explainable chronology.
Feature-store training reproducibility (no hindsight leakage).
Incident forensics when vendor data revises after the fact.

Anti-patterns

Keeping only updated_at and calling it history.
Overwriting rows on correction (“we fixed it”) and losing epistemic trail.
Using local timezone timestamps in ingestion pipelines.
Backfilling without recorded_* semantics.

30-minute adoption checklist

Pick one high-value table (signals, prices, fundamentals, fills).
Add valid/recorded interval columns.
Implement close-then-insert correction path.
Add replay query helper: as_known_at(ts).
Add one CI replay test from historical decision timestamp.

If a system influences money or risk, bitemporal isn’t overengineering. It’s memory with timestamps that actually match reality.