I am a Professor at the University of Queensland’s Electrical Engineering and Computer Science School, and the Director of Artificial Intelligence for the Queensland Digital Health Center (QDHeC). I am the research leader of the ielab. I had been an ARC DECRA Fellow (2018-2020). My research interests span a number of topics:
- Formal models of Information Retrieval; in particular I am interested in
- models of search, information seeking and interactions
- semantic models of search
- exploiting word embeddings in Information Retrieval
- evaluation of Information Retrieval (including task-based evaluation)
- Medical/Health Information Retrieval and Data Science; in particular I am interested in
- retrieval models & strategies for consumers searching the web for health advise
- retrieval models & strategies for cohort identification for clinical trials from electronic medical records
- retrieval models & strategies for clinical decision support and evidence-based medicine
- models, approaches and strategies for automating systematic reviews, in particular with respect to the search phase
- health search evaluation
If you are a UQ student, or a prospective one, and you are interested to work with me as part of your studies, you can find ideas for research projects for PhD and other research degrees here. I also welcome international higher education students wanting to do a research visit and collaborating on ideas within my research interests.
See my publications below for more information, and my Google Scholar profile for an up to date list of publications including citation metrics.
Projects
Publications (170)
2026
12 publications- AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation
2026
- Beyond Chunk-Then-Embed: A Comprehensive Taxonomy and Evaluation of Document Chunking Strategies for Information Retrieval
Open MIND · 2026
- Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings
arXiv (Cornell University) · 2026
- DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
arXiv (Cornell University) · 2026
- Inferential Question Answering
ArXiv.org · 2026
- On the impact of retrieved content representations in RAG Pipelines
arXiv (Cornell University) · 2026
- Rapid, Agile Development and Evaluation of Retrieval Augmented Generation Systems Without Labels
Lecture notes in computer science · 2026
- Starbucks: Improved Training for 2D Matryoshka Embeddings
Lecture notes in computer science · 2026
- The Vulnerability of LLM Rankers to Prompt Injection Attacks
Open MIND · 2026
- When LLM Judges Inflate Scores: Exploring Overrating in Relevance Assessment
ArXiv.org · 2026
- Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking
Open MIND · 2026
- Whole-Pool Setwise Reranking with Long-Context Language Models
arXiv (Cornell University) · 2026
2025
29 publications- Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning
arXiv preprint arXiv:2503.06034 · 2025
- Report from the 4th Strategic Workshop on Information Retrieval in Lorne (SWIRL 2025)
ACM SIGIR Forum 59 (1), 1-68 · 2025
- An Investigation of Prompt Variations for Zero-Shot LLM-Based Rankers
Lecture notes in computer science · 2025
- LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback
arXiv preprint arXiv:2504.01448 · 2025
- Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-ranking
Lecture notes in computer science · 2025
- VISA: Retrieval Augmented Generation with Visual Source Attribution
2025
- Corpus Subsampling: Estimating the Effectiveness of Neural Retrieval Models on Large Corpora
Lecture notes in computer science · 2025
- Set-Encoder: Permutation-Invariant Inter-passage Attention for Listwise Passage Re-ranking with Cross-Encoders
Lecture notes in computer science · 2025
- DenseReviewer: A Screening Prioritisation Tool for Systematic Review Based on Dense Retrieval
Lecture notes in computer science · 2025
- Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks
2025
- ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
2025
- Reassessing Large Language Model Boolean Query Generation for Systematic Reviews
2025
- 2D Matryoshka Training for Information Retrieval
2025
- Beyond GeneGPT: A Multi-Agent Architecture with Open-Source LLMs for Enhanced Genomic Question Answering
Proceedings of the 2025 Annual International ACM SIGIR Conference on · 2025
- Pseudo-Relevance Feedback Can Improve Zero-Shot LLM-Based Dense Retrieval
arXiv e-prints, arXiv: 2503.14887 · 2025
- R 2 LLMs: Retrieval and Ranking with LLMs
2025
- RARR Unraveled: Component-Level Insights into Hallucination Detection and Mitigation
2025
- Unlearning for Federated Online Learning to Rank: A Reproducibility Study
2025
- AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs
2025
- AutoBool: An Reinforcement-Learning trained LLM for Effective Automated Boolean Query Generation for Systematic Reviews
Open MIND · 2025
- Humans are more gullible than LLMs in believing common psychological myths
ArXiv.org · 2025
- Leveraging Gradient Information for Out-of-Domain Performance Estimations
Joint European Conference on Machine Learning and Knowledge Discovery in · 2025
- Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition
PubMed · 2025
- Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition
2025
- Preaching to the ChoIR: Lessons IR Should Share with AI
2025
- Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models
ArXiv.org · 2025
- ReviewHQ: An API-Based System for Reviewer Assignment and Quality Control in Research Conferences
2025
- SIGIR-AP 2025 Tutorial on Retrieval and Ranking with LLMs (R2LLMs)
Proceedings of the 2025 Annual International ACM SIGIR Conference on · 2025
- The epidemiology of hospitalisations from four key environmentally sensitive zoonotic diseases in Queensland, 2012–2019
Tropical Medicine & International Health · 2025
2024
24 publications- Proceedings of the 47th International ACM SIGIR conference on research and development in information retrieval
ACM · 2024
- A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
2024
- Evaluating Generative Ad Hoc Information Retrieval
2024
- PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
2024
- The new paradigm in machine learning – foundation models, large language models and beyond: a primer for physicians
Internal Medicine Journal · 2024
- FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
2024
- Zero-Shot Generative Large Language Models for Systematic Review Screening Automation
Lecture notes in computer science · 2024
- Leveraging LLMs for Unsupervised Dense Retriever Ranking
2024
- Experimental IR meets Multilinguality, multimodality, and interaction
Proceedings of the Fifteenth International Conference of the CLEF · 2024
- Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures
2024
- Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
2024
- A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR
Lecture notes in computer science · 2024
- How to Forget Clients in Federated Online Learning to Rank?
Lecture notes in computer science · 2024
- Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation
2024
- Revisiting Document Expansion and Filtering for Effective First-Stage Retrieval
2024
- CoLAL: Co-learning Active Learning for Text Classification
Proceedings of the AAAI Conference on Artificial Intelligence · 2024
- Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models
arXiv (Cornell University) · 2024
- Does Vec2Text Pose a New Corpus Poisoning Threat?
arXiv (Cornell University) · 2024
- Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection
2024
- Searching in Professional Instant Messaging Applications: User Behaviour, Intent, and Pain-points
2024
- Source-Free Domain-Invariant Performance Prediction
Lecture notes in computer science · 2024
- Starbucks-v2: Improved Training for 2D Matryoshka Embeddings
arXiv (Cornell University) · 2024
- Stochastic Featurization for Active Learning
Lecture notes in computer science · 2024
- TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval
arXiv (Cornell University) · 2024
2023
26 publications- Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?
2023
- Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness
2023
- ChatGPT Hallucinates when Attributing Answers
2023
- Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
2023
- Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
ACM Transactions on Information Systems · 2023
- Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models
2023
- Why clinical artificial intelligence is (almost) non‐existent in Australian hospitals and how to fix it
The Medical Journal of Australia · 2023
- Dependency-aware Self-training for Entity Alignment
2023
- AgAsk: an agent to help answer farmer’s questions from scientific documents
International Journal on Digital Libraries · 2023
- Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation
2023
- An Analysis of Untargeted Poisoning Attack and Defense Methods for Federated Online Learning to Rank Systems
2023
- Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval
2023
- Balanced Topic Aware Sampling for Effective Dense Retriever: A Reproducibility Study
2023
- Outcome-based Evaluation of Systematic Review Automation
2023
- Selecting which Dense Retriever to use for Zero-Shot Search
2023
- Active learning with feature matching for clinical named entity recognition
Natural Language Processing Journal · 2023
- Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection
2023
- Efficient Diversification for Recommending Aggregate Data Visualizations
IEEE Access · 2023
- Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval
2023
- MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction
2023
- Vizput: Insight-aware imputation of incomplete data for visualization recommendation
arXiv preprint arXiv:2311.07926 · 2023
- Artificial Intelligence in Evidence-based Medicine: Challenges and Opportunities
World Scientific Annual Review of Artificial Intelligence · 2023
- Convolutional Persistence as a Remedy to Neural Model Analysis
International Conference on Artificial Intelligence and Statistics, 10839-10855 · 2023
- A Reproducibility Study of Question Retrieval for Clarifying Questions
Lecture notes in computer science · 2023
- AgAsk: A Conversational Search Agent for Answering Agricultural Questions
2023
- Exploring the Representation Power of SPLADE Models
2023
2022
25 publications- Reduce, reuse, recycle: Green information retrieval research
Proceedings of the 45th International ACM SIGIR Conference on Research and · 2022
- To interpolate or not to interpolate: Prf, dense and sparse retrievers
Proceedings of the 45th international ACM SIGIR conference on research and · 2022
- From little things big things grow: A collection with seed studies for medical systematic review literature search
Proceedings of the 45th International ACM SIGIR Conference on Research and · 2022
- Automated MeSH term suggestion for effective query formulation in systematic reviews literature search
Intelligent Systems with Applications · 2022
- CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval · 2022
- Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
arXiv (Cornell University) · 2022
- Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search
2022
- Implicit feedback for dense passage retrieval: A counterfactual approach
Proceedings of the 45th International ACM SIGIR Conference on Research and · 2022
- Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
Lecture notes in computer science · 2022
- The impact of query refinement on systematic review literature search: A query log analysis
Proceedings of the 2022 ACM SIGIR International Conference on Theory of · 2022
- Case law retrieval: problems, methods, challenges and evaluations in the last 20 years
arXiv (Cornell University) · 2022
- High-quality Task Division for Large-scale Entity Alignment
Proceedings of the 31st ACM International Conference on Information & Knowledge Management · 2022
- Seed-Driven Document Ranking for Systematic Reviews: A Reproducibility Study
Lecture notes in computer science · 2022
- Is Non-IID Data a Threat in Federated Online Learning to Rank?
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval · 2022
- Guiding Neural Entity Alignment with Compatibility
2022
- How Does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval · 2022
- Reinforcement online learning to rank with unbiased reward shaping
Information Retrieval · 2022
- Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
arXiv (Cornell University) · 2022
- Robustness of Neural Rankers to Typos: A Comparative Study
2022
- SCC - A Test Collection for Search in Chat Conversations
Proceedings of the 31st ACM International Conference on Information & Knowledge Management · 2022
- Pretrained language models rankers on private data: Is online and federated learning the solution
Proceedings of the Third International Conference on Design of Experimental · 2022
- Pseudo-Relevance Feedback with Dense Retrievers in Pyserini
2022
- Causality Discovery Based on Combined Causes and Multiple Causes in Drug-Drug Interaction
Lecture notes in computer science · 2022
- Rethinking Persistent Homology for Visual Recognition
arXiv (Cornell University) · 2022
- The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval
2022
2021
19 publications- TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
2021
- BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval
2021
- Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
arXiv (Cornell University) · 2021
- Dealing with Typos for BERT-based Passage Retrieval and Ranking
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing · 2021
- Deep Query Likelihood Model for Information Retrieval
Lecture notes in computer science · 2021
- Search Engines vs. Symptom Checkers: A Comparison of their Effectiveness for Online Health Advice
2021
- Effective and Privacy-preserving Federated Online Learning to Rank
2021
- Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study
Lecture notes in computer science · 2021
- Big Brother: A Drop-In Website Interaction Logging Service
2021
- Loss-based Active Learning for Named Entity Recognition
2021
- How do Online Learning to Rank Methods Adapt to Changes of Intent?
2021
- Precision Medicine Search for Paediatric Oncology
2021
- User models, metrics and measures of search: a tutorial on the C/W/L evaluation framework
Proceedings of the 2021 Conference on Human Information Interaction and · 2021
- Cohort-based Clinical Trial Retrieval
Australasian Document Computing Symposium · 2021
- Diagnosis Ranking with Knowledge Graph Convolutional Networks
Lecture notes in computer science · 2021
- PECAN: A Platform for Searching Chat Conversations
2021
- ActiveEA: Active Learning for Neural Entity Alignment
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing · 2021
- MeSH Term Suggestion for Systematic Review Literature Search
Australasian Document Computing Symposium · 2021
- Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
Proceedings of the 19th Annual Workshop of the Australasian Language · 2021
2020
17 publications- Overview of the CLEF eHealth evaluation lab 2020
International conference of the cross-language evaluation forum for European · 2020
- Automatic Boolean Query Formulation for Systematic Review Literature Search
2020
- A comparison of automatic Boolean query formulation for systematic reviews
Information Retrieval · 2020
- Do better search engines really equate to better clinical decisions? If not, why not?
Journal of the Association for Information Science and Technology · 2020
- Temporal tree representation for similarity computation between medical patients
Artificial Intelligence in Medicine · 2020
- Counterfactual Online Learning to Rank
Lecture notes in computer science · 2020
- A Computational Approach for Objectively Derived Systematic Review Search Strategies
Lecture notes in computer science · 2020
- Systematic Review Automation Tools for End-to-End Query Formulation
2020
- How searching under time pressure impacts clinical decision making
Journal of the Medical Library Association JMLA · 2020
- Overview of the TREC 2020 Health Misinformation Track.
Text REtrieval Conference · 2020
- You Can Teach an Old Dog New Tricks: Rank Fusion applied to Coordination Level Matching for Ranking in Systematic Reviews
Lecture notes in computer science · 2020
- How a Conversational Agent Might Help Farmers in the Field
ACM Reference Format · 2020
- Quality Matters: Understanding the Impact of Incomplete Data on Visualization Recommendation
Lecture notes in computer science · 2020
- Representing EHRs with Temporal Tree and Sequential Pattern Mining for Similarity Computing
Lecture notes in computer science · 2020
- Sampling Query Variations for Learning to Rank to Improve Automatic Boolean Query Generation in Systematic Reviews
2020
- Discriminative Features Generation for Mortality Prediction in ICU
Lecture notes in computer science · 2020
- IELAB at TREC Deep Learning Track 2021
2020
2019
18 publications- Trectools: an open-source python library for information retrieval practitioners involved in trec-like campaigns
Proceedings of the 42nd International ACM SIGIR Conference on Research and · 2019
- Fixed-Cost Pooling Strategies
IEEE Transactions on Knowledge and Data Engineering · 2019
- Overview of the CLEF eHealth Evaluation Lab 2019
Lecture notes in computer science · 2019
- Consumer health search on the web: study of web page understandability and its integration in ranking algorithms
Journal of medical Internet research 21 (1), e10986 · 2019
- Overview of the TREC 2019 decision track
Proceedings of TREC 500, 331 · 2019
- Health Cards for Consumer Health Search
2019
- Health Card Retrieval for Consumer Health Search: An Empirical Investigation of Methods
Proceedings of the 28th ACM International Conference on Information and · 2019
- WSDM 2019 Tutorial on Health Search (HS2019) A Full-Day from Consumers to Clinicians
Proceedings of the Twelfth ACM International Conference on Web Search and · 2019
- Impact of a Search Engine on Clinical Decisions Under Time and System Effectiveness Constraints: Research Protocol
JMIR Research Protocols · 2019
- Building Economic Models of Human Computer Interaction: CHI 2019 Course
Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing · 2019
- CLEF eHealth 2019 Evaluation Lab
Lecture notes in computer science · 2019
- Causality Discovery with Domain Knowledge for Drug-Drug Interactions Discovery
Lecture notes in computer science · 2019
- Building Economic Models and Measures of Search
2019
- Health Cards to Assist Decision Making in Consumer Health Search.
PubMed · 2019
- Learning Inter-Sentence, Disorder-Centric, Biomedical Relationships from Medical Literature.
PubMed · 2019
- Towards Automatically Classifying Case Law Citation Treatment Using Neural Networks
2019
- Ielab at the open-source IR replicability challenge 2019
Queensland's institutional digital repository (The University of Queensland) · 2019
- UQ IElab at TREC 2019 Decision Track.
Text REtrieval Conference · 2019
