A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews
This paper introduces a test collection for evaluating the effectiveness of different methods used to retrieve research studies for inclusion in systematic reviews. Systematic reviews appraise and synthesise studies that meet specific inclusion criteria. Systematic reviews intended for a biomedical science audience use boolean queries with many, often complex, search clauses to retrieve studies; these are then manually screened to determine eligibility for inclusion in the review. This process is expensive and time consuming. The development of systems that improve retrieval effectiveness will have an immediate impact by reducing the complexity and resources required for this process. Our test collection consists of approximately 26 million research studies extracted from the freely available MEDLINE database, 94 review (query) topics extracted from Cochrane systematic reviews, and corresponding relevance assessments. Tasks for which the collection can be used for information retrieval system evaluation are described and the use of the collection to evaluate common baselines within one such task is demonstrated. The test collection is available at https://github.com/ielab/SIGIR2017-PICO-Collection.