Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Philipp Tschandl,Christoph Rinner,Iris Zalaudek,Alon Scope,David Gutman,Luc Thomas,Christoph Sinz,Josep Malvehy,Susana Puig,Cliff Rosendahl,Giuseppe Argenziano,Jan Lapins,Aimilios Lallas,John Paoli,Amanda Oakley,Noel Codella,Brian Helba,Caterina Longo,Harald Kittler,Bengü Nisa Akay,Horacio Cabo,Scott W Menzies ,Rainer Hofmann‐Wellenhof ,Ashfaq A Marghoob ,Michael A Marchetti ,Allan C Halpern ,Ralph Braun ,H Peter Soyer

doi:10.1016/s1470-2045(19)30333-x

Abstract

Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. None.

Highlights

Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear
Training of neural networks for automated diagnosis of pigmented skin lesions has been hampered by the insufficient diversity of available datasets and by selection and verification bias. We tackled this problem by collecting dermatoscopic images of all clinically relevant types of pigmented lesions, and created a publicly available training set of 10 015 images for machine learning.[17]. We provided this training set and a test set of 1511 dermatoscopic images to the participants of the International Skin Imaging Collaboration (ISIC) 2018 challenge, with the aim of attracting the best machinelearning labs worldwide to obtain reliable estimates of the accuracy of state-of-the-art machine-learning algorithms
We observed the largest difference between human experts and algorithms with regard to the sensitivity for intraepithelial carcinoma (51·2%, 95% CI 35·5–66·7 vs 90·7%, 77·9–97·4; table), which were commonly misdiagnosed by human readers, whereas the errors of algorithms were more evenly distributed across classes

Summary

Objectives

The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. The aim of this study was to compare the most advanced machine-learning algorithms with the most experienced human experts using publicly available data. Statistical analysis We aimed to include 500 human readers in the study

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Lancet. Oncology	Publication Date: Jun 12, 2019
Citations: 374	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Lancet. Oncology

Lead the way for us

Similar Papers

Palpation as a useful diagnostic tool for skin lesions
Puvesh Punj ... Robert J Whitfield
Journal of Plastic, Reconstructive & Aesthetic Surgery | VOL. 67
Puvesh Punj, et. al.Puvesh Punj ... Robert J Whitfield
19 Feb 2014
Journal of Plastic, Reconstructive & Aesthetic Surgery | VOL. 67

PATTERN AND DISTRIBUTION
Amjad Ali Khan ... Abdul Shaheed Asghar
The Professional Medical Journal | VOL. 25
Amjad Ali Khan, et. al.Amjad Ali Khan ... Abdul Shaheed Asghar
08 Jan 2018
The Professional Medical Journal | VOL. 25

COMMON PIGMENTED SKIN LESIONS
Amjad Ali Khan ... Abdul Shaheed Asghar
The Professional Medical Journal | VOL. 25
Amjad Ali Khan, et. al.Amjad Ali Khan ... Abdul Shaheed Asghar
10 Jan 2018
The Professional Medical Journal | VOL. 25

Estimation of Fractal Dimension in Differential Diagnosis of Pigmented Skin Lesions
Gorana Aralica ... Paško Konjevoda
-
Gorana Aralica, et. al.Gorana Aralica ... Paško Konjevoda
19 Dec 2012
19 Dec 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Lancet. Oncology