Abstract

IntroductionSystematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.MethodsA pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).ResultsTen tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.ConclusionsEvaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.