Publications (54)
2026
5 publications- OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
arXiv preprint arXiv:2603.20278 · 2026
- AgentIR: Reasoning-Aware Retrieval for Deep Research Agents
arXiv preprint arXiv:2603.04384 · 2026
- Do We Still Need Text Features for Video Retrieval in the Era of Vision-Language Models?
European Conference on Information Retrieval, 380-387 · 2026
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
arXiv (Cornell University) · 2026
- ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
arXiv preprint arXiv:2604.01195 · 2026
2025
14 publications- Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
SIGIR 2026 · 2025
- Pixelworld: Towards perceiving everything as pixels
Transactions on Machine Learning Research · 2025
- VISA: Retrieval Augmented Generation with Visual Source Attribution
2025
- DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
2025
- Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks
2025
- Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
2025
- Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs
2025
- R 2 LLMs: Retrieval and Ranking with LLMs
2025
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
ArXiv.org · 2025
- General-Reasoner: Advancing LLM Reasoning Across All Domains
ArXiv.org · 2025
- Gosling Grows Up: Retrieval with Learned Dense and Sparse Representations Using Anserini
2025
- Rethinking On-policy Optimization for Query Augmentation
ArXiv.org · 2025
- SIGIR-AP 2025 Tutorial on Retrieval and Ranking with LLMs (R2LLMs)
Proceedings of the 2025 Annual International ACM SIGIR Conference on · 2025
- ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations
ArXiv.org · 2025
2024
7 publications- MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
2024
- Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering
2024
- PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
2024
- Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
2024
- Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses
2024
- Unifying Multimodal Retrieval via Document Screenshot Embedding
2024
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
arXiv (Cornell University) · 2024
2023
11 publications- Precise Zero-Shot Dense Retrieval without Relevance Labels
2023
- TheoremQA: A Theorem-driven Question Answering Dataset
2023
- Zero-Shot Listwise Document Reranking with a Large Language Model
arXiv (Cornell University) · 2023
- Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval
2023
- Toward Best Practices for Training Multilingual Dense Retrieval Models
ACM Transactions on Information Systems · 2023
- SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
2023
- Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes
2023
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval
arXiv (Cornell University) · 2023
- Enhancing Sparse Retrieval via Unsupervised Learning
2023
- Few-shot In-context Learning for Knowledge Base Question Answering
arXiv (Cornell University) · 2023
- Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard
arXiv (Cornell University) · 2023
2022
8 publications- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
arXiv (Cornell University) · 2022
- To interpolate or not to interpolate: Prf, dense and sparse retrievers
Proceedings of the 45th International ACM SIGIR Conference on Research and · 2022
- Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO V1 and V2
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval · 2022
- Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
Lecture notes in computer science · 2022
- Another Look at DPR: Reproduction of Training and Replication of Retrieval
Lecture notes in computer science · 2022
- An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering
2022
- Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management
International Journal of Data Science and Analytics · 2022
- Pseudo-Relevance Feedback with Dense Retrievers in Pyserini
2022
2021
7 publications- Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
2021
- Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
2021
- Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing · 2021
- Sparsifying Sparse Representations for Passage Retrieval by Top-$k$ Masking
arXiv (Cornell University) · 2021
- e-Health for Older Adults: Navigating Misinformation
2021
- On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications.
2021
- Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
2021
2020
2 publications- H2oloo at TREC 2020: When all you got is a hammer... Deep Learning, Health Misinformation, and Precision Medicine.
Text REtrieval Conference · 2020
- Scientific Claim Verification with VERT5ERINI
arXiv (Cornell University) · 2020
