Inverted Index

intermediate
ArchitecturesLast updated: 2025-01-15

What is Inverted Index?


An inverted index is a data structure that maps terms (words or tokens) to the documents or locations where they appear, enabling efficient keyword-based search and retrieval. Rather than organizing data by documents with their contained words (a forward index), an inverted index organizes by words with their containing documents. This inversion makes it extremely fast to find all documents containing specific terms, which is fundamental to traditional search engines and sparse retrieval methods.


The structure typically stores each unique term as a key, with a posting list as the value containing document identifiers where that term appears, often along with additional information like term frequency, positions within documents, or other metadata used for ranking. When a search query is issued, the system looks up each query term in the inverted index, retrieves the corresponding posting lists, and combines them to find documents matching the query, applying ranking algorithms like BM25 to score relevance.


Inverted indexes remain highly relevant in modern AI systems despite the rise of semantic search. They provide the foundation for sparse retrieval in hybrid search systems, offer excellent performance for exact keyword matching, and are particularly effective for queries involving rare terms, proper nouns, or domain-specific terminology where semantic embeddings may struggle. Many hybrid RAG systems use inverted indexes alongside vector indexes to combine keyword precision with semantic understanding.


Related Terms