Interobserver and Human–Artificial Intelligence Concordance in Differentiating Between Invasive and In Situ Melanoma

Sam Polesie,John Paoli

doi:10.2196/36895

Abstract

Background Machine learning algorithms including convolutional neural networks (CNNs) have recently made significant advances in research settings. Even though several algorithms nowadays are targeted directly to the consumer market, their implementation in clinical practice is still pending. Most melanomas are easy to recognize even without the aid of dermoscopy. Nonetheless, it is often more challenging to discriminate between invasive melanoma and melanoma in situ (MIS) in a preoperative setting even with the assistance of dermoscopy. Although several dermoscopic features suggestive of MIS and invasive melanomas have been presented, their usefulness in a larger setting is limited by how well physicians agree on their presence or absence. Objective The overarching aims of this research project are to identify useful dermoscopic features to help dermatologists predict melanoma thickness and to develop CNNs that can assist dermatologists in the preoperative assessment of melanoma thickness. The ultimate aim is to develop algorithms that can strengthen patient care, improve clinical decision-making, and be used in routine health care. Methods We have included dermoscopic images as well as clinical close-up images of invasive melanomas and MIS from our department during the time period of January 1, 2016, to December 31, 2020. Using this image material, we have trained, validated, and tested two separated CNNs based on dermoscopic and clinical close-up images. We have also invited dermatologists to review the test sets and, for a subset of the dermoscopic images, asked them to specify the presence of prespecified dermoscopic features. Subsequently, we compared CNN outputs to the combined dermatologists’ output for all lesions and assessed the interobserver agreement for several dermoscopic features. Results The CNN developed using dermoscopic images performed on par with the invited dermatologists whereas the CNN using clinical close-up images was outperformed by the group of dermatologists. Two dermoscopic features (atypical blue-white structures and shiny white lines) both displayed a moderate to substantial interobserver agreement and were both indicative of invasive melanomas >1.0 mm. Conclusions CNNs used to differentiate between invasive melanomas and MIS might be an example of a clinically relevant machine learning application, but they need further refinement and evaluation in prospective clinical trials. Only a few dermoscopic features are helpful in distinguishing melanoma thickness. Conflicts of Interest None declared.

Highlights

Machine learning algorithms including convolutional neural networks (CNNs) have recently made significant advances in research settings
It is often more challenging to discriminate between invasive melanoma and melanoma in situ (MIS) in a preoperative setting even with the assistance of dermoscopy
Several dermoscopic features suggestive of MIS and invasive melanomas have been presented, their usefulness in a larger setting is limited by how well physicians agree on their presence or absence

Summary

Introduction

Interobserver and Human–Artificial Intelligence Concordance in Differentiating Between Invasive and In Situ Melanoma Background: Machine learning algorithms including convolutional neural networks (CNNs) have recently made significant advances in research settings. It is often more challenging to discriminate between invasive melanoma and melanoma in situ (MIS) in a preoperative setting even with the assistance of dermoscopy.

Objectives

Results

Conclusion