Vector ANN Index Selection Playbook (HNSW vs IVF-PQ vs DiskANN)
Date: 2026-03-28
Category: knowledge
Audience: platform/search/recsys engineers choosing vector index architecture for production
1) Why this matters
Most vector-search incidents are not caused by “bad embeddings.” They come from index-workload mismatch:
- high-recall SLA + under-tuned ANN,
- memory budget ignored until index build explodes,
- filtering + ANN interaction misunderstood,
- offline benchmark wins that collapse under production concurrency.
This playbook is a practical decision system for choosing among three common families:
- HNSW (graph, memory-heavy, high recall/latency quality)
- IVF-PQ (inverted file + compressed codes, memory-efficient)
- DiskANN (SSD-augmented graph for billion-scale with tighter RAM)
2) Start with workload, not algorithm names
Before any index choice, lock these six inputs:
- Scale: number of vectors
N, dimensiond, growth rate. - SLA: target p95/p99 latency and minimum recall@k.
- Update shape: append-heavy vs frequent updates/deletes.
- Filter selectivity: exact metadata filters before/after ANN.
- Hardware envelope: RAM, SSD class/IOPS, CPU cores, GPU availability.
- Rebuild window: how long you can tolerate index retraining/rebuild.
If these are fuzzy, index comparison is noise.
3) Mental model of the three options
A) HNSW
- Multi-layer proximity graph.
- Excellent recall-latency tradeoff in memory-resident setups.
- Typical tuning knobs: graph degree (
M/m), build search breadth (ef_construction), query search breadth (ef_search). - Tradeoff: higher memory usage and slower build/maintenance than simpler structures.
Good fit when:
- you can afford RAM,
- recall target is high,
- low-latency interactive search matters.
B) IVF-PQ
- Coarse partitioning (IVF lists) + product-quantized codes.
- Big memory reduction from compressed vectors.
- Main knobs: number of lists (
nlist), probes per query (nprobe), PQ code size (m,nbits). - Tradeoff: recall ceiling and tuning sensitivity (especially under distribution drift).
Good fit when:
- memory is constrained,
- dataset is very large,
- moderate recall targets are acceptable for large cost savings.
C) DiskANN
- Graph-style ANN with SSD-aware design.
- Targets high recall at billion scale without keeping full index in RAM.
- Tradeoff: performance depends strongly on storage stack and I/O behavior.
Good fit when:
Nis large enough that full in-memory is too expensive,- you can engineer around SSD latency/throughput and caching.
4) Fast decision matrix
Use this as first pass (then validate by benchmark):
- Need best interactive quality and have RAM headroom -> start with HNSW.
- Need lower memory per vector and can trade some recall/latency headroom -> start with IVF-PQ.
- Need billion-scale on limited RAM with strong SSD -> start with DiskANN.
If unsure: run champion/challenger with HNSW vs IVF-PQ first; add DiskANN when RAM model clearly breaks.
5) Capacity math you should do before experiments
Use rough back-of-envelope first:
- Raw vector memory (float32):
[ \text{RawBytes} \approx N \times d \times 4 ]
- IVF-PQ code memory (approx):
[ \text{PQCodeBytes} \approx N \times (m \times nbits / 8) ]
plus centroid and metadata overhead.
- HNSW overhead intuition:
- vector storage + graph links + per-node metadata.
- memory rises with graph degree and quality targets.
- DiskANN envelope:
- RAM for routing/cache + SSD for index payload,
- plus margin for concurrent query I/O.
Do not wait for OOM during build to “discover” index feasibility.
6) Benchmark protocol (non-negotiable)
Use one reproducible harness for all candidates:
- Fixed eval set with exact ground truth (brute-force or high-precision baseline).
- Sweep knobs to produce recall@k vs p95 latency frontier.
- Report QPS at fixed recall target, not only best-case latency.
- Include build time, peak build memory, index size, refresh time.
- Test under realistic concurrency + filters.
- Repeat on at least two traffic slices (normal + heavy-tail queries).
If you only compare at one parameter point, you are comparing configurations, not algorithms.
7) Tuning playbook by index family
HNSW tuning order
- Set
M/graph degree baseline (memory-quality anchor). - Increase
ef_constructionuntil build-quality gains flatten. - Sweep
ef_searchonline for recall/latency frontier. - Validate under filtering, because candidate truncation can hurt recall.
IVF-PQ tuning order
- Choose
nlist(coarse partition granularity). - Set PQ code budget (
m,nbits) from memory target. - Sweep
nprobeto recover recall. - Re-check after data drift or embedding model changes (coarse centroids can age badly).
DiskANN tuning order
- Validate SSD latency/IOPS envelope first.
- Tune graph/search breadth for recall target.
- Size memory cache for hot regions/entry structures.
- Stress with high concurrency to catch queueing collapse.
8) Filtered search pitfalls (very common)
ANN + metadata filter often fails silently:
- Filtering after ANN candidate generation can reduce returned neighbors unexpectedly.
- Low candidate breadth (
ef_search,nprobe, etc.) causes “good recall in offline, missing rows in prod.”
Guardrails:
- benchmark with real filter predicates,
- increase candidate breadth for filtered paths,
- consider iterative widening when result count is below threshold.
9) Operations: what to monitor in production
Minimum dashboard contract:
- recall proxy / exact-sampled recall@k,
- p50/p95/p99 latency,
- QPS and concurrency,
- index memory footprint and SSD I/O saturation,
- candidate breadth parameters currently in effect,
- filtered-query hit count shortfall rate,
- rebuild duration and failure rate.
Alert examples:
- recall proxy drops with stable embedding model,
- p99 latency spike with SSD queue growth,
- filtered result shortfall rising after config deploy.
10) Safe rollout plan
- Offline frontier build (HNSW vs IVF-PQ; add DiskANN if needed).
- Shadow traffic with live query mix.
- Canary by endpoint/tenant with hard rollback gates.
- Auto-fallback to exact or conservative ANN profile on severe degradation.
- Weekly retune check for drift (new content, changed embedding norms, filter mix).
11) Practical bottom line
- HNSW is usually the easiest path to high quality if RAM allows.
- IVF-PQ wins when memory/$ pressure dominates and you can tune aggressively.
- DiskANN is a powerful scale lever when RAM is the bottleneck and storage engineering is strong.
Pick the index family like an SRE decision: SLA + budget + operability, not leaderboard screenshots.
References
- HNSW paper (arXiv): https://arxiv.org/abs/1603.09320
- Product Quantization (IEEE TPAMI): https://ieeexplore.ieee.org/document/5432202/
- FAISS docs: https://faiss.ai/
- DiskANN (NeurIPS 2019): https://proceedings.neurips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html
- ANN-Benchmarks: http://ann-benchmarks.com/index.html
- Big ANN Benchmarks: https://big-ann-benchmarks.com/
- pgvector README (HNSW/IVFFlat operational notes): https://github.com/pgvector/pgvector