Projects
Publications (70)
2026
8 publications- AgentIR: Reasoning-Aware Retrieval for Deep Research Agents
arXiv (Cornell University) · 2026
- DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
arXiv (Cornell University) · 2026
- Do We Still Need Text Features for Video Retrieval in the Era of Vision-Language Models?
European Conference on Information Retrieval, 380-387 · 2026
- Improving Long-Context Retrieval with Multi-Prefix Embedding
The First Late Interaction Workshop (LIR)@ ECIR 2026 · 2026
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
arXiv (Cornell University) · 2026
- Layer-wise Token Compression for Efficient Document Reranking
arXiv (Cornell University) · 2026
- Starbucks: Improved Training for 2D Matryoshka Embeddings
Lecture notes in computer science · 2026
- Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking
Open MIND · 2026
2025
20 publications- Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning
arXiv preprint arXiv:2503.06034 · 2025
- Report from the 4th Strategic Workshop on Information Retrieval in Lorne (SWIRL 2025)
ACM SIGIR Forum 59 (1), 1-68 · 2025
- An Investigation of Prompt Variations for Zero-Shot LLM-Based Rankers
Lecture notes in computer science · 2025
- LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback
arXiv preprint arXiv:2504.01448 · 2025
- Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-ranking
Lecture notes in computer science · 2025
- VISA: Retrieval Augmented Generation with Visual Source Attribution
2025
- Corpus Subsampling: Estimating the Effectiveness of Neural Retrieval Models on Large Corpora
Lecture notes in computer science · 2025
- Set-Encoder: Permutation-Invariant Inter-passage Attention for Listwise Passage Re-ranking with Cross-Encoders
Lecture notes in computer science · 2025
- Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks
2025
- ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
2025
- Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
2025
- 2D Matryoshka Training for Information Retrieval
2025
- R 2 LLMs: Retrieval and Ranking with LLMs
2025
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
ArXiv.org · 2025
- Distillation versus Contrastive Learning: How to Train Your Rerankers
2025
- Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models
ArXiv.org · 2025
- MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed
ArXiv.org · 2025
- Rethinking On-policy Optimization for Query Augmentation
ArXiv.org · 2025
- SIGIR-AP 2025 Tutorial on Retrieval and Ranking with LLMs (R2LLMs)
Proceedings of the 2025 Annual International ACM SIGIR Conference on · 2025
- The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
2025
2024
12 publications- A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
2024
- PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
2024
- FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
2024
- Zero-Shot Generative Large Language Models for Systematic Review Screening Automation
Lecture notes in computer science · 2024
- Leveraging LLMs for Unsupervised Dense Retriever Ranking
2024
- Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures
2024
- Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
2024
- Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
2024
- Revisiting Document Expansion and Filtering for Effective First-Stage Retrieval
2024
- Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models
arXiv (Cornell University) · 2024
- Does Vec2Text Pose a New Corpus Poisoning Threat?
arXiv (Cornell University) · 2024
- Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection
2024
2023
9 publications- Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
2023
- Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
ACM Transactions on Information Systems · 2023
- Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models
2023
- AgAsk: an agent to help answer farmer’s questions from scientific documents
International Journal on Digital Libraries · 2023
- Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval
2023
- Selecting which Dense Retriever to use for Zero-Shot Search
2023
- Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval
2023
- Exploring the Representation Power of SPLADE Models
2023
- Teaching pre-trained language models to rank effectively, efficiently, and robustly
The University of Queensland · 2023
2022
10 publications- Reduce, reuse, recycle: Green information retrieval research
Proceedings of the 45th International ACM SIGIR Conference on Research and · 2022
- To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
In Proceedings of the 45th International ACM SIGIR Conference on Research · 2022
- CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval · 2022
- Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
arXiv (Cornell University) · 2022
- Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach
In Proceedings of the 45th International ACM SIGIR Conference on Research · 2022
- Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
Lecture notes in computer science · 2022
- Reinforcement online learning to rank with unbiased reward shaping
Information Retrieval · 2022
- Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
arXiv (Cornell University) · 2022
- Robustness of Neural Rankers to Typos: A Comparative Study
2022
- Pseudo-Relevance Feedback with Dense Retrievers in Pyserini
2022
2021
9 publications- TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
2021
- BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval
2021
- Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
arXiv (Cornell University) · 2021
- Dealing with Typos for BERT-based Passage Retrieval and Ranking
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing · 2021
- Deep Query Likelihood Model for Information Retrieval
Lecture notes in computer science · 2021
- Effective and Privacy-preserving Federated Online Learning to Rank
2021
- Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study
Lecture notes in computer science · 2021
- How do Online Learning to Rank Methods Adapt to Changes of Intent?
2021
- IELAB at TREC Deep Learning Track 2021
2021
2020
2 publications- Counterfactual Online Learning to Rank
Lecture notes in computer science · 2020
- IELAB for TREC Conversational Assistance Track (CAsT) 2020
2020