Point-in-Time Feature Store Playbook

2026-02-27 · software

Point-in-Time Feature Store Playbook

Preventing Training-Serving Skew in Real Systems

Date: 2026-02-27
Category: software / ml-systems
Use case fit: Quant signals, ranking/recommendation, risk scoring, anomaly detection


1) Why this matters

Most model failures in production are not model-architecture failures. They are data-timing failures:

If your model says “90% confidence” but your feature pipeline is time-inconsistent, you have expensive fiction.


2) Core principle: “As-of” truth

Every feature value must answer one strict question:

“What would this value have been as of prediction timestamp T, given only data available by T?”

This implies two time axes per event:

  1. event_time: when the real-world event happened
  2. ingest_time (or publish_time): when your system could first see/use it

For point-in-time correctness, feature retrieval for label timestamp T must use:

Using only event_time is a common leak in delayed-data systems.


3) Minimum data contract for a feature store

For each feature table/entity key, store at least:

Without feature_version, you cannot explain drift caused by code changes. Without ingest_time, you cannot defend against delayed-data leakage.


4) Offline training set construction (the safe pattern)

Given labels (entity_id, label_time, y):

  1. Build a spine of prediction events (entity_id, label_time)
  2. For each feature group, do an as-of join against history
  3. Pick latest row where:
    • same entity_id
    • event_time <= label_time
    • ingest_time <= label_time
  4. Apply deterministic fallback (null/default/last known within TTL)
  5. Persist the materialized training set with:
    • feature logic commit hash
    • data snapshot id
    • generation timestamp

SQL sketch (warehouse style)

WITH candidates AS (
  SELECT
    s.entity_id,
    s.label_time,
    f.feature_value,
    f.event_time,
    f.ingest_time,
    ROW_NUMBER() OVER (
      PARTITION BY s.entity_id, s.label_time
      ORDER BY f.event_time DESC, f.ingest_time DESC
    ) AS rn
  FROM spine s
  JOIN feature_history f
    ON f.entity_id = s.entity_id
   AND f.event_time <= s.label_time
   AND f.ingest_time <= s.label_time
)
SELECT *
FROM candidates
WHERE rn = 1;

Do not “simplify” to a latest snapshot join. That usually leaks future state.


5) Online serving parity

Training parity is impossible if offline and online use different logic paths.

Preferred architecture

Practical guardrails


6) Training-serving skew taxonomy

A. Definition skew

Different formulas in offline vs online code.

Signal: distribution shift immediately after deploy, even with stable traffic.

B. Time skew

Online uses fresher (or staler) data than what training assumed.

Signal: strong hourly/sessional error patterns.

C. Null/default skew

Offline imputed mean; online uses zero/null; or opposite.

Signal: error spikes on cold entities / sparse cohorts.

D. Entity-resolution skew

Different key mapping logic (user merge, symbol rename, account hierarchy).

Signal: cohort-specific degradation with high join-miss rates.

E. Version skew

Model expects feature v3; service emits v2 after partial rollback.

Signal: sudden quality cliff with no model change.


7) Monitoring that actually catches skew

Track this per feature and per important cohort:

  1. Freshness lag: now - feature_event_time
  2. Availability rate: non-null ratio in serving
  3. Join hit rate (offline build + online requests)
  4. Population Stability Index (PSI) or JS divergence between train and serve
  5. Definition parity checks on sampled requests (recompute offline)
  6. Version mismatch count (model_feature_spec_version != served_feature_version)

Set page-worthy alerts for:


8) Backfill and replay policy (often ignored)

Backfills are where many teams quietly corrupt evaluation.

Rules:

If you cannot reproduce the exact features used by a model last month, you do not have governance.


9) Quant-specific notes (execution/risk models)

For trading/execution contexts:

A model trained on revised bars/ticks can look brilliant and fail at the open.


10) Rollout checklist

Before shipping a new feature or model:


11) Anti-patterns to avoid


12) Bottom line

A feature store is not mainly a storage product. It is a time-consistency and reproducibility system.

If you protect point-in-time correctness and definition parity, model quality becomes a real signal. If you don’t, your AUC/Sharpe/precision can be mostly pipeline artifact.


Suggested references