Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task

Titus J Brinker,Katharina Baratella,Joachim Klode,Emmanouil Gratsias,Marie Hegemann,Therezia Bokor-Billmann,Elena Tomaschewski,Monika Ettinger,Elsa Sody,Marion Mickler,Elke Sattler,Sandra Hallasch,Kirsten Morrison,Christina Drusio,Thomas Schwarz,Swantje Schlott,Sandra Falkvoll,Axel Hauschild,Ze Guo,Anja Pinczker,Anna Dith,Zdenka Hanhart,Valerie Glutsch,Maria Christolouka,Hans Wolff,Benjamin Thomas,Bastian Schilling,Simon Raub,Sarah Knispel,Natalie Scheller,Carola Berking,Johanna Matull,Anna Wilm,Wiebke Ludwig-Peitsch,Achim Hekler,Lena Bischof,Judith Sirokay,Finja Jockenhöfer,Philipp Koch,Rogina Motamedi,Laetitia Messinger,Knut Schäkel,Dora Stölzl,Christoffer Gebhardt,Martin Salzmann,Sophia Bender-Säbelkampf,Markus V Heppt,Katharina Kilian,Andrea Baczako,Nina Giese,Katrin Kahlert,Ulrike Wehkamp,Liliana Matei,Philipp Jansen,Sebastian Mastnik,Jörg Faulhaber,Christof Von Kalle,Alexander Enk,Sascha Gerdes,Katja Leonhard,Maximilian Petri,Katrin Salva,Miriam Linke,Saskia Herz,Lucie Heinzerling,Malte Metzner,Marc Horbrügger,Daniela Hartmann,Nina Booken,Sophia Deffaa,Sebastian Haferkamp,Konstantin Drexler,Timo Schank,Anne Zaremba,Verena Müller,Matthias Betke,Friederike Egberts,Nadine Steingrube,Ali Saeed M Alamri,Maria Maagk,Philipp Schrüfer,Patrick Gholam,Katja Hohaus,Carolin Haas,Dirk Schadendorf,Natalie Lidia Lapczynski,Andreas Kerstan,Oliver Wiedow,Verena Dinauer,Anna Halupczok,Katharina Drerup,Katja Kosova,Sebastian Krammer,Max Schlaak,Magarete Albrecht,Anne Rosenthal,Anna Martaki,Birgit Achatz,Luise Kraas,Astrid Bergbreiter,Federieke Thielking,Anja Gesierich,Viola Harde,Ingo Stoffels,Alexandra Olischewski,Theodora Kanaki,Cristel Ruini,Holger Hänßle,Suzan Nasifoglu,Ante Karoglan,Michael Weichenthal,Biance Philipp,Sarah Schäfer,Constanze Wittmann,Benjamin Ewald,Marion Jost,Eleftheria Chorti,Selma Ugurel,Nolwenn Maurier,Ann‐Sophie Bohne ,Kristina Buder‐Bakhaya ,Tim Holland‐Letz ,Klaus G Griewank ,Cyrill Géraud ,Anna‐Sophie Erkens ,Jan‐Malte Placke ,Claudia Bär ,Jochen Utikal

doi:10.1016/j.ejca.2019.04.001

Abstract

BackgroundRecent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy. MethodsWe used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics. FindingsThe mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1% (range 40.0%–100%) and 60% (range 21.3%–91.3%), respectively. At a mean sensitivity of 74.1%, the CNN exhibited a mean specificity of 86.5% (range 70.8%–91.3%). At a mean specificity of 60%, a mean sensitivity of 87.5% (range 80%–95%) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2% at a mean sensitivity of 73.3%. With the same high specificity of 69.2%, the CNN had a mean sensitivity of 84.5%. InterpretationA CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.

Full Text