# A Systematic Review on Studies Evaluating the Performance of Active Learning Compared to Human Reading for Systematic Review Data ## Goal This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading. Specifically, the goal was to create an overview of studies and or reports on simulation studies which investigate the benefit of Active Learning to accelerate systematic reviews within the screening of title and abstract phase compared to human screening . ## Methods To retrieve the articles for this systematic review, three search databases were queried: - Web of Science - Scopus - Embase It was in this stage decided to exclude any articles before 2006, because in a review by O'Mara-Eves<sup>1</sup>, the first publication actually applying some sort of automation to title and abstract screening was found in 2006. The specific search strings can be found within the `Methods\search_strings.txt` file. After deduplication, this search yielded a total of 1290 articles. Title and Abstract screening of these articles was done using ASReview with the default settings<sup>2</sup>. A list of articles that were used as prior knowledge can be found within the `Methods\prior_knowledge.txt`. Articles were marked as relevant when they contained information on using some type of Active Learning within an experiment or simulation study to accelerate the screening phase within systematic reviews. The stopping rule for screening was the following: After screening 323 papers (25%), stop when finding 30 irrelevant papers in a row. In total 353 records were screened (27.36%). An overview of the progress in screening can be seen in `Methods\tiab_screening_statistics.png`. The .asreview file containing all decision information can be found in the methods folder as well. A total of 66 articles were identified as relevant. The specific results of title and abstract screening can be found within `Methods\asreview_result_current-efforts-of-testing-active-learning.xlsx`. These articles were then assessed based on the full-text, resulting in 38 full-text inclusions. In the `Methods\fulltext_exclusions.docx` a short overview of articles that were excluded and their reason for exclusion can be found. Finally, from the reference lists of the final inclusions another 4 references were identified as potential inclusions. After full-text assessment 2 were finally included as well, resulting in a total of 40 final inclusions. All steps above are also systematically depicted in the PRISMA flowchart: `Methods\PRISMA_2020_Current_efforts_of_Using_Active_Learning_to_Accelerate_Systematic_reviews.docx` These final inclusions can be found in `Results\final_inclusions_current_efforts_AL_SR.ris`. In the results folder, one can also find a frequency histogram depicting how many (non-unique) datasets were used in each study. Datasets used in one study, are possibly also used in another. In total 255 unique datasets were identified within 37 articles. Three articles<sup>3,4,5</sup> contained only databases or datasets that were excluded from the analysis. The following databases were excluded: - Epistemonikos: - 2/3 of the datasets contained less than 200 documents - The annotation of the datasets was conducted for testing purposes only, by senior medical students. - Part of a combination between Limsi-Cochrane and the CLEF eHealth databases: - Part of this database consisted of meta-analyses, which were excluded from the analysis. The systematic reviews from the dataset were included. - RCV1-v2: - A database consisting of news articles and therefore excluded. - TREC databases: - Multiple TREC databases were encountered, however upon inspection it became clear that these datasets were created for testing purposes and did not contain Systematic Reviews. More information on both included and excluded datasets can be found with the `Results\Datasets overview.docx` file. ## References 1. O’Mara-Eves, A., Thomas, J., McNaught, J. et al. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4, 5 (2015). https://doi.org/10.1186/2046-4053-4-5 2. Van de Schoot, Rens, De Bruin, Jonathan, Schram, Raoul, Zahedi, Parisa, De Boer, Jan, Weijdema, Felix, Kramer, Bianca, Huijts, Martijn, Hoogerwerf, Maarten, Ferdinands, Gerbrich, Harkema, Albert, Willemsen, Joukje, Ma, Yongchao, Fang, Qixiang, Tummers, Lars, & Oberski, Daniel. (2021). ASReview: Active learning for systematic reviews (v0.17.1). Zenodo. https://doi.org/10.5281/zenodo.5126631 3. Cormack, G.V. and M.R. Grossman. Scalability of Continuous Active Learning for Reliable High-Recall Text Classification. in CIKM'16: ACM Conference on Information and Knowledge Management. 2016. ACM. 4. Zhang, H., et al., Evaluating sentence-level relevance feedback for high-recall information retrieval. Information Retrieval Journal, 2020. 23(1). 5. Cormack, G.V. and M.R. Grossman, Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. arXiv:1504.06868 [cs], 2015.