Efficient In-Memory Inverted Indexes - Theory and Practice

Presented at:SIGIR2025
Tutorial Website

A hands-on tutorial for efficient inverted index-based search using the PISA engine.

Instructors:
Sean MacAvaney
Antonio Mallia
Michal Siedlaczek

Inverted indexes are the backbone of most large-scale information retrieval systems. Although conceptually simple, high-performance inverted indexes require a deep understanding of low-level system optimizations, compression techniques, and traversal strategies. With the widespread adoption of in-memory search engines, the rise of learned sparse retrieval (LSR), and the increasing complexity of ranking pipelines, the design space for efficient indexing and retrieval systems has expanded significantly. This tutorial addresses a critical knowledge gap between textbook-style explanations and advanced techniques required for efficient and optimized retrieval. It aims to equip researchers and practitioners with a comprehensive understanding of how modern in-memory search systems are designed, built, and optimized for high-performance retrieval across large-scale document collections. As part of this tutorial, the participants will learn important theoretical concepts and how to apply them in practice using the open source PISA search engine. They will work through a series of examples illustrating how to build and query an index, compare performance and relevance across multiple parameters such as compression techniques and retrieval algorithms, etc. The knowledge and skills learned from this tutorial will serve as a basis for extending PISA with new state-of-the-art IR techniques and evaluating them in an academic setting.