International Skin Imaging Collaboration Research Articles

Convolutional neural networks (CNNs) are a type of artificial intelligence that shows promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image data sets with varying image capture standardization. The aim of our study was to use CNN models with the same architecture-trained on image sets acquired with either the same image capture device and technique (standardized) or with varied devices and capture techniques (nonstandardized)-and test variability in performance when classifying skin cancer images in different populations. In all, 3 CNNs with the same architecture were trained. CNN nonstandardized (CNN-NS) was trained on 25,331 images taken from the International Skin Imaging Collaboration (ISIC) using different image capture devices. CNN standardized (CNN-S) was trained on 177,475 MoleMap images taken with the same capture device, and CNN standardized number 2 (CNN-S2) was trained on a subset of 25,331 standardized MoleMap images (matched for number and classes of training images to CNN-NS). These 3 models were then tested on 3 external test sets: 569 Danish images, the publicly available ISIC 2020 data set consisting of 33,126 images, and The University of Queensland (UQ) data set of 422 images. Primary outcome measures were sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). Teledermatology assessments available for the Danish data set were used to determine model performance compared to teledermatologists. When tested on the 569 Danish images, CNN-S achieved an AUROC of 0.861 (95% CI 0.830-0.889) and CNN-S2 achieved an AUROC of 0.831 (95% CI 0.798-0.861; standardized models), with both outperforming CNN-NS (nonstandardized model; P=.001 and P=.009, respectively), which achieved an AUROC of 0.759 (95% CI 0.722-0.794). When tested on 2 additional data sets (ISIC 2020 and UQ), CNN-S (P<.001 and P<.001, respectively) and CNN-S2 (P=.08 and P=.35, respectively) still outperformed CNN-NS. When the CNNs were matched to the mean sensitivity and specificity of the teledermatologists on the Danish data set, the models' resultant sensitivities and specificities were surpassed by the teledermatologists. However, when compared to CNN-S, the differences were not statistically significant (sensitivity: P=.10; specificity: P=.053). Performance across all CNN models as well as teledermatologists was influenced by image quality. CNNs trained on standardized images had improved performance and, therefore, greater generalizability in skin cancer classification when applied to unseen data sets. This finding is an important consideration for future algorithm development, regulation, and approval.

Read full abstract

BackgroundPrevious studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.MethodsWe designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25 331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use.Findings64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58·8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82·0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0·0001). For the top 25 submitted algorithms, 47·1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed.InterpretationWe have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice.FundingMelanoma Research Alliance and La Marató de TV3.

Read full abstract

International Skin Imaging Collaboration Research Articles

Related Topics

Articles published on International Skin Imaging Collaboration

Classification of Skin Lesions Using Weighted Majority Voting Ensemble Deep Learning

Theory-Based Approaches to Support Dermoscopic Image Interpretation Education: A Review of the Literature.

Application of a parallel branches network based on Transformer for skin melanoma segmentation

Classification of Skin Cancer with Deep Transfer Learning Method

Assessing the Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists: Retrospective Comparative Study.

Skin cancers image classification using transformation and first order statistic features with artificial neural network classifier

Improving Skin Color Diversity in Cancer Detection: Deep Learning Approach.

Detection and segmentation of melanoma skin cancer in dermoscopy images using modified Alexnet convolutional neural network‐morphological methodology

Cancer-Net SCa: tailored deep neural network designs for detection of skin cancer from dermoscopy images

A Machine Vision Approach for Classification of Skin Cancer Using Hybrid Texture Features.

Dense and shuffle attention U‐Net for automatic skin lesion segmentation

Classification of Skin Cancer Images Using Convolutional Neural Networks

Automatic lesion segmentation using atrous convolutional deep neural networks in dermoscopic skin cancer images

Machine Learning Algorithm for Detection of Deadliest Forms of Skin Cancer

Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge

From machine learning to deep learning: experimental comparison of machine learning and deep learning for skin cancer image segmentation

Dermoscopic Image Classification with Neural Style Transfer

LinkNet-B7: Noise Removal and Lesion Segmentation in Images of Skin Cancer

Multi-class skin lesion classification using prism- and segmentation-based fractal signatures

Melanoma Classification Using a Novel Deep Convolutional Neural Network with Dermoscopic Images.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

International Skin Imaging Collaboration Research Articles

Related Topics

Articles published on International Skin Imaging Collaboration

Classification of Skin Lesions Using Weighted Majority Voting Ensemble Deep Learning

Theory-Based Approaches to Support Dermoscopic Image Interpretation Education: A Review of the Literature.

Application of a parallel branches network based on Transformer for skin melanoma segmentation

Classification of Skin Cancer with Deep Transfer Learning Method

Assessing the Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists: Retrospective Comparative Study.

Skin cancers image classification using transformation and first order statistic features with artificial neural network classifier

Improving Skin Color Diversity in Cancer Detection: Deep Learning Approach.

Detection and segmentation of melanoma skin cancer in dermoscopy images using modified Alexnet convolutional neural network‐morphological methodology

Cancer-Net SCa: tailored deep neural network designs for detection of skin cancer from dermoscopy images

A Machine Vision Approach for Classification of Skin Cancer Using Hybrid Texture Features.

Dense and shuffle attention U‐Net for automatic skin lesion segmentation

Classification of Skin Cancer Images Using Convolutional Neural Networks

Automatic lesion segmentation using atrous convolutional deep neural networks in dermoscopic skin cancer images

Machine Learning Algorithm for Detection of Deadliest Forms of Skin Cancer

Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge

From machine learning to deep learning: experimental comparison of machine learning and deep learning for skin cancer image segmentation

Dermoscopic Image Classification with Neural Style Transfer

LinkNet-B7: Noise Removal and Lesion Segmentation in Images of Skin Cancer

Multi-class skin lesion classification using prism- and segmentation-based fractal signatures

Melanoma Classification Using a Novel Deep Convolutional Neural Network with Dermoscopic Images.