Vector ANN Index Selection Playbook (HNSW vs IVF-PQ vs DiskANN)

Date: 2026-03-28
Category: knowledge
Audience: platform/search/recsys engineers choosing vector index architecture for production

1) Why this matters

Most vector-search incidents are not caused by “bad embeddings.” They come from index-workload mismatch:

high-recall SLA + under-tuned ANN,
memory budget ignored until index build explodes,
filtering + ANN interaction misunderstood,
offline benchmark wins that collapse under production concurrency.

This playbook is a practical decision system for choosing among three common families:

HNSW (graph, memory-heavy, high recall/latency quality)
IVF-PQ (inverted file + compressed codes, memory-efficient)
DiskANN (SSD-augmented graph for billion-scale with tighter RAM)

2) Start with workload, not algorithm names

Before any index choice, lock these six inputs:

Scale: number of vectors N, dimension d, growth rate.
SLA: target p95/p99 latency and minimum recall@k.
Update shape: append-heavy vs frequent updates/deletes.
Filter selectivity: exact metadata filters before/after ANN.
Hardware envelope: RAM, SSD class/IOPS, CPU cores, GPU availability.
Rebuild window: how long you can tolerate index retraining/rebuild.

If these are fuzzy, index comparison is noise.

3) Mental model of the three options

A) HNSW

Multi-layer proximity graph.
Excellent recall-latency tradeoff in memory-resident setups.
Typical tuning knobs: graph degree (M/m), build search breadth (ef_construction), query search breadth (ef_search).
Tradeoff: higher memory usage and slower build/maintenance than simpler structures.

Good fit when:

you can afford RAM,
recall target is high,
low-latency interactive search matters.

B) IVF-PQ

Coarse partitioning (IVF lists) + product-quantized codes.
Big memory reduction from compressed vectors.
Main knobs: number of lists (nlist), probes per query (nprobe), PQ code size (m, nbits).
Tradeoff: recall ceiling and tuning sensitivity (especially under distribution drift).

Good fit when:

memory is constrained,
dataset is very large,
moderate recall targets are acceptable for large cost savings.

C) DiskANN

Graph-style ANN with SSD-aware design.
Targets high recall at billion scale without keeping full index in RAM.
Tradeoff: performance depends strongly on storage stack and I/O behavior.

Good fit when:

N is large enough that full in-memory is too expensive,
you can engineer around SSD latency/throughput and caching.

4) Fast decision matrix

Use this as first pass (then validate by benchmark):

Need best interactive quality and have RAM headroom -> start with HNSW.
Need lower memory per vector and can trade some recall/latency headroom -> start with IVF-PQ.
Need billion-scale on limited RAM with strong SSD -> start with DiskANN.

If unsure: run champion/challenger with HNSW vs IVF-PQ first; add DiskANN when RAM model clearly breaks.

5) Capacity math you should do before experiments

Use rough back-of-envelope first:

Raw vector memory (float32):

[ \text{RawBytes} \approx N \times d \times 4 ]

IVF-PQ code memory (approx):

[ \text{PQCodeBytes} \approx N \times (m \times nbits / 8) ]

plus centroid and metadata overhead.

HNSW overhead intuition:

vector storage + graph links + per-node metadata.
memory rises with graph degree and quality targets.

DiskANN envelope:

RAM for routing/cache + SSD for index payload,
plus margin for concurrent query I/O.

Do not wait for OOM during build to “discover” index feasibility.

6) Benchmark protocol (non-negotiable)

Use one reproducible harness for all candidates:

Fixed eval set with exact ground truth (brute-force or high-precision baseline).
Sweep knobs to produce recall@k vs p95 latency frontier.
Report QPS at fixed recall target, not only best-case latency.
Include build time, peak build memory, index size, refresh time.
Test under realistic concurrency + filters.
Repeat on at least two traffic slices (normal + heavy-tail queries).

If you only compare at one parameter point, you are comparing configurations, not algorithms.

7) Tuning playbook by index family

HNSW tuning order

Set M/graph degree baseline (memory-quality anchor).
Increase ef_construction until build-quality gains flatten.
Sweep ef_search online for recall/latency frontier.
Validate under filtering, because candidate truncation can hurt recall.

IVF-PQ tuning order

Choose nlist (coarse partition granularity).
Set PQ code budget (m, nbits) from memory target.
Sweep nprobe to recover recall.
Re-check after data drift or embedding model changes (coarse centroids can age badly).

DiskANN tuning order

Validate SSD latency/IOPS envelope first.
Tune graph/search breadth for recall target.
Size memory cache for hot regions/entry structures.
Stress with high concurrency to catch queueing collapse.

8) Filtered search pitfalls (very common)

ANN + metadata filter often fails silently:

Filtering after ANN candidate generation can reduce returned neighbors unexpectedly.
Low candidate breadth (ef_search, nprobe, etc.) causes “good recall in offline, missing rows in prod.”

Guardrails:

benchmark with real filter predicates,
increase candidate breadth for filtered paths,
consider iterative widening when result count is below threshold.

9) Operations: what to monitor in production

Minimum dashboard contract:

recall proxy / exact-sampled recall@k,
p50/p95/p99 latency,
QPS and concurrency,
index memory footprint and SSD I/O saturation,
candidate breadth parameters currently in effect,
filtered-query hit count shortfall rate,
rebuild duration and failure rate.

Alert examples:

recall proxy drops with stable embedding model,
p99 latency spike with SSD queue growth,
filtered result shortfall rising after config deploy.

10) Safe rollout plan

Offline frontier build (HNSW vs IVF-PQ; add DiskANN if needed).
Shadow traffic with live query mix.
Canary by endpoint/tenant with hard rollback gates.
Auto-fallback to exact or conservative ANN profile on severe degradation.
Weekly retune check for drift (new content, changed embedding norms, filter mix).

11) Practical bottom line

HNSW is usually the easiest path to high quality if RAM allows.
IVF-PQ wins when memory/$ pressure dominates and you can tune aggressively.
DiskANN is a powerful scale lever when RAM is the bottleneck and storage engineering is strong.

Pick the index family like an SRE decision: SLA + budget + operability, not leaderboard screenshots.

References

HNSW paper (arXiv): https://arxiv.org/abs/1603.09320
Product Quantization (IEEE TPAMI): https://ieeexplore.ieee.org/document/5432202/
FAISS docs: https://faiss.ai/
DiskANN (NeurIPS 2019): https://proceedings.neurips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html
ANN-Benchmarks: http://ann-benchmarks.com/index.html
Big ANN Benchmarks: https://big-ann-benchmarks.com/
pgvector README (HNSW/IVFFlat operational notes): https://github.com/pgvector/pgvector