Skin cancer is the abnormal development of cells on the surface of the skin and is one of the most fatal diseases in humans. It usually appears in locations that are exposed to the sun, but can also appear in areas that are not regularly exposed to the sun. Due to the striking similarities between benign and malignant lesions, skin cancer detection remains a problem, even for expert dermatologists. Considering the inability of dermatologists to diagnose skin cancer accurately, a convolutional neural network (CNN) approach was used for skin cancer diagnosis. However, the CNN model requires a significant number of image datasets for better performance; thus, image augmentation and transfer learning techniques have been used in this study to boost the number of images and the performance of the model, because there are a limited number of medical images. This study proposes an ensemble transfer-learning-based model that can efficiently classify skin lesions into one of seven categories to aid dermatologists in skin cancer detection: (i) actinic keratoses, (ii) basal cell carcinoma, (iii) benign keratosis, (iv) dermatofibroma, (v) melanocytic nevi, (vi) melanoma, and (vii) vascular skin lesions. Five transfer learning models were used as the basis of the ensemble: MobileNet, EfficientNetV2B2, Xception, ResNext101, and DenseNet201. In addition to the stratified 10-fold cross-validation, the results of each individual model were fused to achieve greater classification accuracy. An annealing learning rate scheduler and test-time augmentation (TTA) were also used to increase the performance of the model during the training and testing stages. A total of 10,015 publicly available dermoscopy images from the HAM10000 (Human Against Machine) dataset, which contained samples from the seven common skin lesion categories, were used to train and evaluate the models. The proposed technique attained 94.49% accuracy on the dataset. These results suggest that this strategy can be useful for improving the accuracy of skin cancer classification. However, the weighted average of f1-score, recall, and precision were obtained to be 94.68%, 94.49%, and 95.07%, respectively.
Read full abstract