A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Xiaoxuan Liu,Livia Faes,Aditya U Kale,Siegfried K Wagner,Dun Jack Fu,Alice Bruynseels,Thushika Mahendiran,Gabriella Moraes,Mohith Shamdas,Christoph Kern,Joseph R Ledsam,Martin K Schmid,Konstantinos Balaskas,Eric J Topol,Lucas M Bachmann,Pearse A Keane,Alastair K Denniston

doi:10.1016/s2589-7500(19)30123-2

Abstract

Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging. In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of deep learning models and health-care professionals based on medical imaging, for any disease, were included. We excluded studies that used medical waveform data graphics material or investigated the accuracy of image segmentation rather than disease classification. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample external validation were included in a meta-analysis, using a unified hierarchical model. This study is registered with PROSPERO, CRD42018091176. Our search identified 31 587 studies, of which 82 (describing 147 patient cohorts) were included. 69 studies provided enough data to construct contingency tables, enabling calculation of test accuracy, with sensitivity ranging from 9·7% to 100·0% (mean 79·1%, SD 0·2) and specificity ranging from 38·9% to 100·0% (mean 88·3%, SD 0·1). An out-of-sample external validation was done in 25 studies, of which 14 made the comparison between deep learning models and health-care professionals in the same sample. Comparison of the performance between health-care professionals in these 14 studies, when restricting the analysis to the contingency table for each study reporting the highest accuracy, found a pooled sensitivity of 87·0% (95% CI 83·0-90·2) for deep learning models and 86·4% (79·9-91·0) for health-care professionals, and a pooled specificity of 92·5% (95% CI 85·1-96·4) for deep learning models and 90·5% (80·6-95·7) for health-care professionals. Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals. However, a major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample. Additionally, poor reporting is prevalent in deep learning studies, which limits reliable interpretation of the reported diagnostic accuracy. New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology. None.

Highlights

We searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019, that developed or validated a deep learning model for the diagnosis of any disease feature from medical imaging material and histopathology, with no language restrictions
We found that an increasing number of primary studies are reporting diagnostic accuracy of algorithms to be equivalent or superior when compared with humans; there are concerns around bias and generalisability
We found no other systematic reviews comparing performance of artificial intelligence (AI) algorithms with health-care professionals for all diseases

Summary

Objectives

We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Lancet Digital Health	Publication Date: Sep 25, 2019
Citations: 1083	License type: cc-by

R Discovery Prime

R Discovery Prime

A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Lancet Digital Health

Lead the way for us

Similar Papers

Human versus machine in medicine: can scientific literature answer the question?
Tessa S Cook
The Lancet Digital Health | VOL. 1
Tessa S CookTessa S Cook
24 Sep 2019
The Lancet Digital Health | VOL. 1

Deep Learning Under Scrutiny: Performance Against Health Care Professionals in Detecting Diseases from Medical Imaging - Systematic Review and Meta-Analysis
Livia Faes ... Konstantinos Balaskas
SSRN Electronic Journal | VOL. -
Livia Faes, et. al.Livia Faes ... Konstantinos Balaskas
13 May 2019
SSRN Electronic Journal | VOL. -

New beginnings
The Lancet Digital Health
The Lancet Digital Health | VOL. 2
The Lancet Digital Health The Lancet Digital Health
23 Dec 2019
The Lancet Digital Health | VOL. 2

Abstract 11190: Subgroup Comparison of Electrocardiogram Deep-Learning Model Performance for Estimating Coronary Artery Calcium Score
Hakje Yoo ... Jong-Ho Kim
Circulation | VOL. 146
Hakje Yoo, et. al.Hakje Yoo ... Jong-Ho Kim
08 Nov 2022
Circulation | VOL. 146

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Lancet Digital Health