Active Machine Learning in Systematic Literature Reviews: Bias, Fixes, and Appropriate Use

Cara Henning,Michelle Cawley,Heidi Hubbard,Arun Varghese

doi:10.1289/isesisee.2018.p02.3940

Abstract

Text analytics has emerged as a cost-effective innovation used to support systematic literature reviews in human health risk assessments. Text analytics approaches allow some studies to be removed from consideration without undergoing manual review. Supervised machine learning relies on a training dataset to build models that go on to automatically classify a larger set of unclassified documents. However, manually reviewing references to create the training dataset can be resource intensive. &#8220;Active&#8221; machine learning is a potential solution that uses an algorithm to focus on the most informative documents, reducing the number of references that must be manually reviewed. We simulate active machine learning by using a set of approximately 7,000 abstracts from the scientific literature that was previously classified by subject matter experts with regard to relevance to epidemiology. We examine the performance of alternative sampling approaches to sequentially expanding the training dataset, specifically looking at uncertainty-based sampling and probability-based sampling. We discover that while such active learning methods can potentially reduce training dataset size compared to random sampling, active machine learning-based predictions of model performance potentially suffer from bias that negates its potential benefits. We discuss approaches and the extent to which the bias resulting from skewed sampling can be compensated. We compare the results from active machine learning to those based on a semi-supervised machine learning method called supervised clustering and show how the latter, with only a small training dataset, can outperform the former in terms of (i) accuracy of model predictions and (ii) the fraction of documents eliminated from review. Finally, we propose a useful role for active learning in contexts where accuracy metrics are not critical and/or where it is necessary to rapidly retrieve a subset of relevant literature.

Full Text