Abstract
BackgroundConvolutional neural networks (CNNs) have outperformed dermatologists in classifying pigmented skin lesions under artificial conditions. We investigated, for the first time, the performance of three-dimensional (3D) and two-dimensional (2D) CNNs and dermatologists in the early detection of melanoma in a real-world setting. MethodsIn this prospective study, 1690 melanocytic lesions in 143 patients with high-risk criteria for melanoma were evaluated by dermatologists, 2D-FotoFinder-ATBM and 3D-Vectra WB360 total body photography (TBP). Excision was based on the dermatologists' dichotomous decision, an elevated CNN risk score (study-specific malignancy cut-off: FotoFinder >0.5, Vectra >5.0) and/or the second dermatologist's assessment with CNN support. The diagnostic accuracy of the 2D and 3D CNN classification was compared with that of the dermatologists and the augmented intelligence based on histopathology and dermatologists’ assessment. Secondary end-points included reproducibility of risk scores and naevus counts per patient by medical staff (gold standard) compared to automated 3D and 2D TBP CNN counts. ResultsThe sensitivity, specificity, and receiver operating characteristics area under the curve (ROC-AUC) for risk-score-assessments compared to histopathology of 3D-CNN with 95% confidence intervals (CI) were 90.0%, 64.6% and 0.92 (CI 0.85–1.00), respectively. While dermatologists and augmented intelligence achieved the same sensitivity (90%) and comparable classification ROC-AUC (0.91 [CI 0.80–1.00], 0.88 [CI 0.77–1.00]) with 3D-CNN, their specificity was superior (92.3% and 86.2%, respectively). The 2D-CNN (sensitivity: 70%, specificity: 40%, ROC-AUC: 0.68 [CI 0.46–0.90]) was outperformed by 3D CNN and dermatologists. The 3D-CNN showed a higher correlation coefficient for repeated measurements of 246 lesions (R = 0.89) than the 2D-CNN (R = 0.79). The mean naevus count per patient varied significantly (gold standard: 210 lesions; 3D-CNN: 469; 2D-CNN: 1324; p < 0.0001). ConclusionsOur study emphasises the importance of validating the classification of CNNs in real life. The novel 3D-CNN device outperformed the 2D-CNN and achieved comparable sensitivity with dermatologists. The low specificity of CNNs and the lack of automated counting of TBP nevi currently limit the use of augmented intelligence in clinical practice.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.