Vector ANN Index Selection Playbook (HNSW vs IVF-PQ vs DiskANN)

2026-03-28 · software

Vector ANN Index Selection Playbook (HNSW vs IVF-PQ vs DiskANN)

Date: 2026-03-28
Category: knowledge
Audience: platform/search/recsys engineers choosing vector index architecture for production

1) Why this matters

Most vector-search incidents are not caused by “bad embeddings.” They come from index-workload mismatch:

This playbook is a practical decision system for choosing among three common families:

  1. HNSW (graph, memory-heavy, high recall/latency quality)
  2. IVF-PQ (inverted file + compressed codes, memory-efficient)
  3. DiskANN (SSD-augmented graph for billion-scale with tighter RAM)

2) Start with workload, not algorithm names

Before any index choice, lock these six inputs:

  1. Scale: number of vectors N, dimension d, growth rate.
  2. SLA: target p95/p99 latency and minimum recall@k.
  3. Update shape: append-heavy vs frequent updates/deletes.
  4. Filter selectivity: exact metadata filters before/after ANN.
  5. Hardware envelope: RAM, SSD class/IOPS, CPU cores, GPU availability.
  6. Rebuild window: how long you can tolerate index retraining/rebuild.

If these are fuzzy, index comparison is noise.


3) Mental model of the three options

A) HNSW

Good fit when:

B) IVF-PQ

Good fit when:

C) DiskANN

Good fit when:


4) Fast decision matrix

Use this as first pass (then validate by benchmark):

If unsure: run champion/challenger with HNSW vs IVF-PQ first; add DiskANN when RAM model clearly breaks.


5) Capacity math you should do before experiments

Use rough back-of-envelope first:

  1. Raw vector memory (float32):

[ \text{RawBytes} \approx N \times d \times 4 ]

  1. IVF-PQ code memory (approx):

[ \text{PQCodeBytes} \approx N \times (m \times nbits / 8) ]

plus centroid and metadata overhead.

  1. HNSW overhead intuition:
  1. DiskANN envelope:

Do not wait for OOM during build to “discover” index feasibility.


6) Benchmark protocol (non-negotiable)

Use one reproducible harness for all candidates:

  1. Fixed eval set with exact ground truth (brute-force or high-precision baseline).
  2. Sweep knobs to produce recall@k vs p95 latency frontier.
  3. Report QPS at fixed recall target, not only best-case latency.
  4. Include build time, peak build memory, index size, refresh time.
  5. Test under realistic concurrency + filters.
  6. Repeat on at least two traffic slices (normal + heavy-tail queries).

If you only compare at one parameter point, you are comparing configurations, not algorithms.


7) Tuning playbook by index family

HNSW tuning order

  1. Set M/graph degree baseline (memory-quality anchor).
  2. Increase ef_construction until build-quality gains flatten.
  3. Sweep ef_search online for recall/latency frontier.
  4. Validate under filtering, because candidate truncation can hurt recall.

IVF-PQ tuning order

  1. Choose nlist (coarse partition granularity).
  2. Set PQ code budget (m, nbits) from memory target.
  3. Sweep nprobe to recover recall.
  4. Re-check after data drift or embedding model changes (coarse centroids can age badly).

DiskANN tuning order

  1. Validate SSD latency/IOPS envelope first.
  2. Tune graph/search breadth for recall target.
  3. Size memory cache for hot regions/entry structures.
  4. Stress with high concurrency to catch queueing collapse.

8) Filtered search pitfalls (very common)

ANN + metadata filter often fails silently:

Guardrails:


9) Operations: what to monitor in production

Minimum dashboard contract:

Alert examples:


10) Safe rollout plan

  1. Offline frontier build (HNSW vs IVF-PQ; add DiskANN if needed).
  2. Shadow traffic with live query mix.
  3. Canary by endpoint/tenant with hard rollback gates.
  4. Auto-fallback to exact or conservative ANN profile on severe degradation.
  5. Weekly retune check for drift (new content, changed embedding norms, filter mix).

11) Practical bottom line

Pick the index family like an SRE decision: SLA + budget + operability, not leaderboard screenshots.


References