Abstract
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.
Highlights
Background & SummaryDermatoscopy is a widely used diagnostic technique that improves the diagnosis of benign and malignant pigmented skin lesions in comparison to examination with the unaided eye[1]
Recent advances in graphics card capabilities and machine learning techniques set new benchmarks with regard to the complexity of neural networks and raised expectations that automated diagnostic systems will soon be available that diagnose all kinds of pigmented skin lesions without the need of human expertise[3]
Training of neural-network based diagnostic algorithms requires a large number of annotated images[4] but the number of high quality dermatoscopic images with reliable diagnoses is limited or restricted to only a few classes of diseases
Summary
Dermatoscopy is a widely used diagnostic technique that improves the diagnosis of benign and malignant pigmented skin lesions in comparison to examination with the unaided eye[1]. Accompanying the book Interactive Atlas of Dermoscopy[6] a CD-ROM is commercially available with digital versions of 1044 dermatoscopic images including 167 images of non-melanocytic lesions, and 20 images of diagnoses not covered in the HAM10000 dataset. This is one of the most diverse available datasets in regard to covered diagnoses, its use is probably limited because of its constrained accessibility. Because of permissive licensing (CC-0), wellstructured availability, and large size it is currently the standard source for dermatoscopic image analysis research It is, biased towards melanocytic lesions (12893 of 13786 images are nevi or melanomas). In order to provide more information to machine-learning research groups who intend to use the HAM10000 training set for research we describe the evolution and the specifics of the dataset (Fig. 1) in detail
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have