Scalable Generative IR

People involved:

About this project

The Scalable Generative Information Retrieval (IR) project investigates how generative models can be applied to information retrieval tasks in a scalable and efficient manner. Our goal is to develop retrieval systems that leverage large language models (LLMs) while remaining practical for real-world applications, such as domain-specific search and systematic reviews.

Key Areas of Focus

Elastic & Scalable Modeling: Our models support dynamic adjustment of transformer depth and embedding dimensionality, enabling flexible trade-offs between effectiveness and efficiency. We also leverage techniques like 2D Matryoshka pruning to train adaptable models that operate efficiently across a range of deployment budgets.
Context Compression for RAG: We explore advanced context compression techniques, such as COCOM (Context Compression Model), to enhance the efficiency of RAG systems.