HNSW (Hierarchical Navigable Small World): A Graph-Based Index for Fast Approximate Search

Hierarchical Navigable Small World (HNSW) is a clever, graph-based method for creating a multi-layered, interconnected map of data that allows AI to find the “nearest” or most similar items in a massive dataset with incredible speed, without having to check every single one.

Imagine you’re in the Library of Congress, tasked with finding a specific, obscure book. You could start at one end and scan every single shelf—the brute-force approach. It’s thorough, but it would take you a lifetime. Now, what if the library had a magical, multi-layered map? The top layer of the map only shows the major sections: “History,” “Science,” “Fiction.” You instantly find the “Science” section. The next layer of the map shows the sub-sections: “Physics,” “Biology,” “Chemistry.” You jump to “Physics.” You continue this process, moving from broad categories to more and more specific ones, until you’re standing right in front of the book you were looking for. This is, in essence, what the Hierarchical Navigable Small World (HNSW) algorithm does for AI. It’s a clever, graph-based method for creating a multi-layered, interconnected map of data that allows AI to find the “nearest” or most similar items in a massive dataset with incredible speed, without having to check every single one.

‍

The Six Degrees of Kevin Bacon and the “Small World”

The story of HNSW begins not with computer science, but with sociology. In the 1960s, Stanley Milgram’s famous “small-world experiment” suggested that any two people in the United States were connected by a surprisingly short chain of acquaintances—the origin of the “six degrees of separation” idea. In 1998, mathematicians Duncan Watts and Steven Strogatz gave this phenomenon a formal mathematical framework (Watts & Strogatz, 1998). They introduced a model that starts with a regular, highly ordered network (like a ring lattice where each node is connected to its immediate neighbors) and then systematically “rewires” a fraction of the edges at random. What they discovered was remarkable: even a tiny amount of random rewiring is enough to dramatically decrease the average path length of the network, connecting distant parts of the graph with a few random “shortcuts.” At the same time, the network remains highly clustered, just like the original regular lattice. This combination of high clustering and short path length is the defining characteristic of a small-world network. It’s the best of both worlds: the local, cliquey structure of a regular graph and the global connectivity of a random graph. This structure is incredibly efficient for information propagation, which is why it appears so frequently in natural and man-made systems.

‍

From Skip Lists to Navigable Small Worlds

The first intellectual ancestor of HNSW is the skip list, a probabilistic data structure invented by William Pugh in 1989 that offers a clever alternative to balanced trees. It starts with a simple, sorted linked list of elements—the “ground floor.” Then, it builds a series of “express lanes” on top. Each upper layer is a sparser linked list that contains a random subset of the nodes from the layer below it. To find an element, you start on the highest, most sparse layer and traverse forward until you’re about to overshoot your target. Then, you drop down to the next layer and repeat the process, progressively refining your search. This layered approach allows you to “skip” over large chunks of the data, achieving an average search time of O(log n), which is a massive improvement over the O(n) time required for a simple linked list.

The second key idea is the Navigable Small World (NSW) graph, a single-layer graph where each node (representing a data point) is connected to its nearest neighbors based on some distance metric (Malkov et al., 2014). To find the nearest neighbor to a query, you start at a random entry point and perform a greedy search: at each step, you move to the neighbor that is closest to your query. You repeat this process until you reach a “local minimum”—a node whose neighbors are all farther away from the query than the node itself. While this is much faster than a brute-force search, it has a significant weakness: the greedy search can easily get trapped in a local minimum that is not the true global minimum, especially in the complex, high-dimensional spaces typical of AI applications. This is like trying to find the lowest point in a mountain range by always walking downhill; you might end up in a small valley instead of the main canyon.

‍

How HNSW Fuses Two Great Ideas Into One

The HNSW algorithm brilliantly fuses these two concepts to overcome their individual limitations (Malkov & Yashunin, 2018). It builds a multi-layered graph, just like a skip list, but instead of simple linked lists, each layer is a navigable small-world graph. The top layers are sparse “express lanes” with long-range connections that allow for rapid traversal across the entire data space, while the bottom layers are dense “local roads” that provide high-precision local navigation. This hierarchical structure is the key to escaping the local minima problem of NSW.

Insertion: When a new data point is inserted, it is first assigned a random maximum layer with an exponentially decaying probability, ensuring that very few points make it to the top layers. The algorithm then starts at the top layer, finds the nearest neighbors to the new point in that layer, and uses them as entry points to the layer below. This process is repeated, descending layer by layer, until it reaches the new point’s assigned maximum layer. At each layer, the algorithm creates connections between the new point and its nearest neighbors, building the graph structure. This process is more complex than a simple greedy search, as it involves maintaining a dynamic list of candidate neighbors and using a heuristic to select the best connections, ensuring a well-connected and efficient graph.

Search: To perform a search, you start at a random entry point in the top layer and greedily traverse the graph, always moving to the neighbor closest to your query. When you can’t find a closer neighbor in the current layer, you use the closest point you found as the entry point to the layer below and continue the search. This process is repeated until you reach the bottom layer, where you perform a final, more detailed search. This combination of long-range jumps in the upper layers and short-range, high-precision movements in the lower layers allows HNSW to quickly and efficiently navigate to the correct region of the graph, avoiding the local minima that plague single-layer NSW graphs.

‍

Getting the Most Out of Your HNSW Index

The remarkable performance of HNSW isn’t just about its clever structure; it’s also highly tunable, allowing engineers to strike a precise balance between search speed, accuracy (recall), and memory usage. This tuning is primarily controlled by three parameters. The first, M (Maximum Connections), defines the maximum number of bidirectional links each node in the graph can have. A higher M creates a denser graph that generally leads to higher recall, but at the cost of increased memory usage and longer index build times. Typical values range from 5 to 48 (Malkov & Yashunin, 2018).

The second parameter, efConstruction, governs the quality of the index itself. During the build process, the algorithm maintains a dynamic list of the best candidate neighbors found so far, and efConstruction sets the size of that list. A larger value allows the algorithm to explore more potential connections for each new node, resulting in a higher-quality graph. The trade-off is a steeper cost in index construction time—an investment made once that pays dividends on every subsequent query.

The third parameter, ef (or efSearch), is the one you tune most frequently in a live application. It controls the size of the candidate list during the search process itself. A larger ef means the algorithm explores more paths through the graph at each step, increasing the probability of finding the true nearest neighbors and thus improving recall. The cost is a direct increase in search latency. Tuning ef is the most direct lever for adjusting the speed-accuracy trade-off at query time, without needing to rebuild the index.

Finding the optimal values for these parameters is more of an art than a science, often involving empirical testing on a specific dataset. As a general rule, you increase these values to favor recall and decrease them to favor speed and lower memory usage. Many vector database providers, like Pinecone and MongoDB, offer guidance and sensible defaults, but achieving peak performance for a demanding production workload often requires careful, data-driven tuning.

Parameter	What it Controls	Higher Value Means	Lower Value Means
M	Number of neighbors per node	Higher recall, more memory, slower build	Lower recall, less memory, faster build
efConstruction	Quality of the graph build	Higher recall, much slower build	Lower recall, much faster build
ef	Depth of the search	Higher recall, slower search	Lower recall, faster search

‍

The Algorithm That Powers Modern AI Search

The theoretical elegance of HNSW translates directly into dominant production performance, which has made it the go-to algorithm for ANN search in most systems. Its ability to deliver high recall at incredible speeds, even on datasets with billions of vectors, is why it forms the backbone of virtually every modern vector database, including open-source projects like Faiss and commercial powerhouses like Pinecone, Weaviate, Milvus, Qdrant, and Redis.

These databases are the specialized engines that power a new generation of AI-native applications. When an application needs to perform a similarity search, it sends a query vector to the database, which then uses its HNSW index to find the k-nearest neighbors in milliseconds.

Semantic search is the most visible application. Instead of matching keywords, semantic search systems find documents, images, or products that are conceptually similar to a query. A user can search for "summer vacation outfits" and get back images of sundresses and sandals, even if those exact words don't appear in the product descriptions. This works by converting both the query and the database items into vector embeddings and using HNSW to find the closest matches in that vector space.

Recommendation engines are another major beneficiary. When a streaming service immediately suggests something else after you finish a movie, HNSW is often doing the heavy lifting. The service represents both users and items as vectors in a shared space, and finding recommendations is simply a matter of finding the item vectors closest to your user vector.

Retrieval-Augmented Generation (RAG) is perhaps the most consequential modern application. RAG is the technology that allows large language models (LLMs) to answer questions about up-to-date or proprietary information. When a RAG-enabled chatbot receives a question, it first converts that question into a vector and uses HNSW to search a private knowledge base. It then injects the most relevant documents it finds into the prompt it sends to the LLM, allowing the model to generate an accurate, grounded response without needing to be retrained. The right tooling makes it straightforward to wire together these components, the vector database, HNSW index, and LLM, into a production-ready RAG pipeline without having to manage the underlying infrastructure.

Anomaly and threat detection round out the picture. In cybersecurity and finance, HNSW can be used to spot unusual behavior by creating vector representations of normal network traffic or financial transactions. Any data point that is far from all known clusters of normal activity becomes an immediate candidate for investigation.

‍

Where HNSW Still Has Room to Grow

Despite its dominance, HNSW is not without its challenges. The most significant is its memory footprint. Because the algorithm stores the full vectors and a graph of connections, the RAM required can be substantial, especially for large datasets. This has led to research into techniques like Product Quantization (PQ), which compresses the vectors to reduce their memory size, often at the cost of some accuracy. Many production systems use a hybrid approach, like IVF-PQ, and then use HNSW on top of the quantized vectors.

Another challenge is the static nature of the graph. While new points can be added incrementally, deleting points from an HNSW graph is a notoriously difficult problem that can degrade the graph’s structure and performance over time. This is an active area of research, with many vector databases developing their own proprietary solutions for handling deletions and updates efficiently.

Finally, the performance of HNSW can be sensitive to the distribution of the data. Highly clustered or non-uniform data can sometimes pose challenges for the greedy traversal algorithm. In fact, even the order in which you index your data can impact the final recall if the data has strong categorical clusters, making it crucial to randomize the insertion order to ensure a robust and unbiased graph (Marqo, 2025).

‍

A Triumph of Algorithmic Ingenuity

HNSW is a beautiful example of how ideas from different fields—sociology, mathematics, and computer science—can be combined to create a powerful and elegant solution to a complex problem. By combining the “small-world” properties of real-world networks with the hierarchical structure of a skip list, HNSW provides a fast, scalable, and remarkably accurate solution for finding similar items in massive datasets. It represents a triumph of algorithmic ingenuity, turning a computationally intractable problem into a practical and widely-used tool. As AI continues to generate and consume ever-larger amounts of data, and as vector embeddings become the universal language for representing that data, algorithms like HNSW will only become more critical, forming the invisible but essential plumbing of the next generation of intelligent applications.