Abstract

Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This enables the identification of all relevant documents after viewing only a fraction of the total documents. However, current approaches lack robust stopping criteria, so that reviewers do not know when they have seen all or a certain proportion of relevant documents. This means that such systems are hard to implement in live reviews. This paper introduces a workflow with flexible statistical stopping criteria, which offer real work reductions on the basis of rejecting a hypothesis of having missed a given recall target with a given level of confidence. The stopping criteria are shown on test datasets to achieve a reliable level of recall, while still providing work reductions of on average 17%. Other methods proposed previously are shown to provide inconsistent recall and work reductions across datasets.

Highlights

  • Evidence synthesis technology is a rapidly emerging field that promises to change the practice of evidence synthesis work [1]

  • Heuristic stopping criteria Some studies give the example of heuristic stopping criteria based on drawing a given number of irrelevant articles in a row [6, 7]. We take this as a proxy for estimating that the proportion of remaining documents that are relevant in the unseen documents is low, as the probability of observing 0 relevant documents in a given sample is a decreasing function of the number of relevant documents in the population. We find this a promising intuition, but argue that (1) it ignores uncertainty, as discussed in relation to the previous method; (2) it lacks a formal description that would help to find a suitable threshold for the criterion; and (3) it misunderstands the significance of a low proportion of relevant documents in estimating the recall

  • In a live systematic review, reviewers would never know when this had been reached, but these are the work savings most often reported in machine learning for systematic review screening studies

Read more

Summary

Introduction

Evidence synthesis technology is a rapidly emerging field that promises to change the practice of evidence synthesis work [1]. The algorithm chooses which studies will be screened by humans, often those which are likely to be relevant or about which the model is uncertain, in order to generate more labels to feed back to the machine By prioritising those studies most likely to be relevant, a human reviewer most often identifies all relevant studies—or a given proportion of relevant studies (described by recall: the number of relevant studies identified divided by the total number of relevant studies)—before having seen all the documents in the corpus. The proportion of documents not yet seen by the human when they reach the given recall threshold is referred to as the work saved This represents the proportion of documents that they do not have to screen, which they would have had to without machine learning

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call