Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning.

Olalekan A Uthman,Sian Taylor-Phillips,Hema Mistry,Lena Al-Khudairy,Rachel Court,G J Melendez-Torres,Jodie Enderby,Chidozie Nduka,Aileen Clarke

doi:10.3310/udir6682

Olalekan A Uthman, Sian Taylor-Phillips + Show 7 more

Open Access

PDF Available

https://doi.org/10.3310/udir6682

Copy DOI

Export

Save

Cite

Journal: Health Technology Assessment	Publication Date: Nov 1, 2022
Citations: 5	License type: publisher-specific-oa

Affiliation: University of Warwick, University of Exeter

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

As part of our ongoing systematic review of complex interventions for the primary prevention of cardiovascular diseases, we have developed and evaluated automated machine-learning classifiers for title and abstract screening. The aim was to develop a high-performing algorithm comparable to human screening. We followed a three-phase process to develop and test an automated machine learning-based classifier for screening potential studies on interventions for primary prevention of cardiovascular disease. We labelled a total of 16,611 articles during the first phase of the project. In the second phase, we used the labelled articles to develop a machine learning-based classifier. After that, we examined the performance of the classifiers in correctly labelling the papers. We evaluated the performance of the five deep-learning models [i.e. parallel convolutional neural network ( CNN ), stacked CNN , parallel-stacked CNN , recurrent neural network ( RNN ) and CNN-RNN]. The models were evaluated using recall, precision and work saved over sampling at no less than 95% recall. We labelled a total of 16,611 articles, of which 676 (4.0%) were tagged as 'relevant' and 15,935 (96%) were tagged as 'irrelevant'. The recall ranged from 51.9% to 96.6%. The precision ranged from 64.6% to 99.1%. The work saved over sampling ranged from 8.9% to as high as 92.1%. The best-performing model was parallel CNN , yielding a 96.4% recall, as well as 99.1% precision, and a potential workload reduction of 89.9%. We used words from the title and the abstract only. More work needs to be done to look into possible changes in performance, such as adding features such as full document text. The approach might also not be able to be used for other complex systematic reviews on different topics. Our study shows that machine learning has the potential to significantly aid the labour-intensive screening of abstracts in systematic reviews of complex interventions. Future research should concentrate on enhancing the classifier system and determining how it can be integrated into the systematic review workflow. This project was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme and will be published in Health Technology Assessment. See the NIHR Journals Library website for further project information.

Full Text