Validation of a Dermatology-Focused Multimodal Large Language Model in Classification of Pigmented Skin Lesions
Background: Artificial intelligence (AI) has shown significant promise in augmenting diagnostic capabilities across medical specialties. Recent advancements in generative AI allow for synthesis and interpretation of complex clinical data including imaging and patient history to assess disease risk. Objective: To evaluate the diagnostic performance of a dermatology-trained multimodal large language model (DermFlow, Delaware, USA) in assessing malignancy risk of pigmented skin lesions. Methods: This retrospective study utilized data from 59 patients with 68 biopsy-proven pigmented skin lesions seen at Indiana University clinics from February 2023 to May 2025. De-identified patient histories and clinical images were input into DermFlow, and clinical images only were input into Claude Sonnet 4 (Claude) to generate differential diagnoses. Clinician pre-operative diagnoses were extracted from the clinical note. Assessments were compared to histopathologic diagnoses (gold standard). Results: Among 68 clinically concerning pigmented lesions, DermFlow achieved 47.1% top diagnosis accuracy and 92.6% any-diagnosis accuracy, with F1 = 0.948, sensitivity 93.9%, and specificity 89.5% (balanced accuracy 91.7%). Claude had 8.8% top diagnosis and 73.5% any-diagnosis accuracy, F1 = 0.816, sensitivity 81.6%, specificity 52.6% (balanced accuracy 67.1%). Clinicians achieved 38.2% top diagnosis and 72.1% any-diagnosis accuracy, F1 = 0.776, sensitivity 67.3%, specificity 84.2% (balanced accuracy 75.8%). DermFlow recommended biopsy in 95.6% of cases vs. 82.4% for Claude, with multiple pairwise differences favoring DermFlow (p < 0.05). Conclusions: DermFlow demonstrated comparable or superior diagnostic performance to clinicians and superior performance to Claude in evaluating pigmented skin lesions. Although additional data must be gathered to further validate the model in real clinical settings, these initial findings suggest potential utility for dermatology-trained AI models in clinical practice, particularly in settings with limited dermatologist availability.
- Research Article
9
- 10.1111/j.1468-3083.2012.04651.x
- Jul 23, 2012
- Journal of the European Academy of Dermatology and Venereology
Many research groups have recently developed equipments and statistical methods enabling pattern classification of pigmented skin lesions. To differentiate between benign and malignant ones, the mathematical extraction of digital patterns together with the use of appropriate statistical approaches is a challenging task. To design a simple scoring model that provides accurate classification of benign and malignant palmo-plantar pigmented skin lesions, by evaluation of parameters obtained by digital dermoscopy analysis (DDA). In the present study we used a digital dermoscopy analyser to evaluate a series of 445 palmo-plantar melanocytic skin lesion images (25 melanomas 420 nevi). Area under the receiver operator curve, sensitivity and specificity were calculated to evaluate the diagnostic performance of our scoring model for the differentiation of benign and malignant palmo-plantar melanocytic lesions. Model performance reached a very high value (0.983). The DDA parameters selected by the model that proved statistically significant were: area, peripheral dark regions, total imbalance of colours, entropy, dark area and red and blue multicomponent. When all seven model variables were used in a multivariate mode, setting sensitivity at 100% to avoid false negatives, we estimated a minimum specificity of about 80%. Simplicity of use and effectiveness of implementation are important requirements for the success of quantitative methods in routine clinical practice. Scoring systems meet these requirements. Their outcomes are accessible in real time without the use of any data processing system, thus allowing decisions to be made quickly and effectively.
- Research Article
28
- 10.1159/000530225
- Mar 21, 2023
- Dermatology
Background: While skin cancers are less prevalent in people with skin of color, they are more often diagnosed at later stages and have a poorer prognosis. The use of artificial intelligence (AI) models can potentially improve early detection of skin cancers; however, the lack of skin color diversity in training datasets may only widen the pre-existing racial discrepancies in dermatology. Objective: The aim of this study was to systematically review the technique, quality, accuracy, and implications of studies using AI models trained or tested in populations with skin of color for classification of pigmented skin lesions. Methods: PubMed was used to identify any studies describing AI models for classification of pigmented skin lesions. Only studies that used training datasets with at least 10% of images from people with skin of color were eligible. Outcomes on study population, design of AI model, accuracy, and quality of the studies were reviewed. Results: Twenty-two eligible articles were identified. The majority of studies were trained on datasets obtained from Chinese (7/22), Korean (5/22), and Japanese populations (3/22). Seven studies used diverse datasets containing Fitzpatrick skin type I–III in combination with at least 10% from black Americans, Native Americans, Pacific Islanders, or Fitzpatrick IV–VI. AI models producing binary outcomes (e.g., benign vs. malignant) reported an accuracy ranging from 70% to 99.7%. Accuracy of AI models reporting multiclass outcomes (e.g., specific lesion diagnosis) was lower, ranging from 43% to 93%. Reader studies, where dermatologists’ classification is compared with AI model outcomes, reported similar accuracy in one study, higher AI accuracy in three studies, and higher clinician accuracy in two studies. A quality review revealed that dataset description and variety, benchmarking, public evaluation, and healthcare application were frequently not addressed. Conclusions: While this review provides promising evidence of accurate AI models in populations with skin of color, the majority of the studies reviewed were obtained from East Asian populations and therefore provide insufficient evidence to comment on the overall accuracy of AI models for darker skin types. Large discrepancies remain in the number of AI models developed in populations with skin of color (particularly Fitzpatrick type IV–VI) compared with those of largely European ancestry. A lack of publicly available datasets from diverse populations is likely a contributing factor, as is the inadequate reporting of patient-level metadata relating to skin color in training datasets.
- Research Article
17
- 10.1016/j.jaad.2010.08.019
- May 11, 2011
- Journal of the American Academy of Dermatology
Dermatoscopy versus Tzanck smear test: A comparison of the value of two tests in the diagnosis of pigmented skin lesions
- Conference Article
5
- 10.1109/meco55406.2022.9797111
- Jun 7, 2022
Currently, one of the most common types of malignant neoplasms in humans is skin cancer. There has been a need for automated and reliable approaches for accurate and rapid clinical detection and skin cancer diagnosis. The development of artificial intelligence-based automated assistive diagnostic tools for early detection of skin cancer on dermatoscopic images can help to reduce melanoma-induced mortality. Image segmentation is a key step in automated diagnostic systems for pigmented skin lesions. This paper presents a neural network system of semantic segmentation for pigmented skin lesions on dermatoscopic images based on the U-Net convolutional neural network. The simulation results showed that the proposed system allows detecting and segmenting pigmented lesions with an accuracy of 93.32%. The use of neural network segmentation as a stage of pre-processing of dermatoscopy images allows minimizing the influence of the patient's skin color type, the level of illumination, and the resulting occlusions in the presence of hair structures. The proposed system prepares dermatoscopic images for further analysis for automated classification of pigmented skin lesions.
- Research Article
2
- 10.1364/boe.483828
- Apr 21, 2023
- Biomedical Optics Express
Because pigmented skin lesion image classification based on manually designed convolutional neural networks (CNNs) requires abundant experience in neural network design and considerable parameter tuning, we proposed the macro operation mutation-based neural architecture search (OM-NAS) approach in order to automatically build a CNN for image classification of pigmented skin lesions. We first used an improved search space that was oriented toward cells and contained micro and macro operations. The macro operations include InceptionV1, Fire and other well-designed neural network modules. During the search process, an evolutionary algorithm based on macro operation mutation was employed to iteratively change the operation type and connection mode of parent cells so that the macro operation was inserted into the child cell similar to the injection of virus into host DNA. Ultimately, the searched best cells were stacked to build a CNN for the image classification of pigmented skin lesions, which was then assessed on the HAM10000 and ISIC2017 datasets. The test results showed that the CNN built with this approach was more accurate than or almost as accurate as state-of-the-art (SOTA) approaches such as AmoebaNet, InceptionV3 + Attention and ARL-CNN in terms of image classification. The average sensitivity of this method on the HAM10000 and ISIC2017 datasets was 72.4% and 58.5%, respectively.
- Research Article
125
- 10.1097/00008390-199806000-00009
- Jun 1, 1998
- Melanoma Research
Epiluminescence microscopy (ELM) is a non-invasive technique for in vivo examination which can provide additional criteria for the clinical diagnosis of pigmented skin lesions (PSLs). In the present study we attempt to determine whether PSLs can be automatically diagnosed by an integrated computerized system. This system should recognize the PSL, automatically extract features and use these features in training an artificial neural network, which should--if sufficiently trained--be capable of recognizing and classifying a new PSL without human aid. One hundred and twenty images of randomly selected histologically proven PSLs (33 common naevi, 48 dysplastic naevi and 39 malignant melanomas) were used in this study. The images were digitally obtained and the morphological features of the PSLs were extracted electronically without human assistance. The numerical data were then divided into learning and testing cases and linked to an artificial neural network for training and for further classification of lesions that the system had not been trained on. Our results show that the computerized system was able to automatically identify 95% of the PSLs presented. The sensitivity and specificity of the computerized system were 90% and 74% respectively. In contrast, when differentiating between individual types of lesions, the system performed at true positive rates of only 38% for malignant melanoma, 62% for dysplastic naevi and 33% for common naevi. Our data indicate that (1) ELM images of PSLs provide an excellent source for digital image analysis; (2) the vast majority of PSLs can be correctly identified by a relatively simple (and thus not "intelligent") application of digital image analysis; (3) automatic feature extraction based mainly on ABCD rules provides reliable data on the distinction between benign and malignant PSLs; and (4) there is evidence that artificial neural networks can be trained to adequately discriminate between benign and malignant PSLs.
- Research Article
9
- 10.1016/j.compbiomed.2024.108742
- Jun 14, 2024
- Computers in Biology and Medicine
Systematic review of approaches to detection and classification of skin cancer using artificial intelligence: Development and prospects
- Research Article
- 10.52225/narra.v5i2.1852
- Apr 21, 2025
- Narra J
Skin cancer is one of the most prevalent cancers worldwide, with early diagnosis being critical for improving survival rates. Dermoscopy, a non-invasive imaging tool, is widely used for identifying pigmented skin lesions. However, its accuracy is heavily dependent on expert interpretation, which introduces variability and limits accessibility in resource-constrained settings. This highlighted the need for automated solutions to enhance diagnostic consistency and aid in early detection. The aim of this study was to develop a refined machine-learning framework for classifying pigmented skin lesions using dermoscopy images. We employed an enhanced Inception-V3 model, a state-of-the-art convolutional neural network, integrated with a simplified soft-attention mechanism, advanced data augmentation techniques, and Bayesian hyperparameter tuning. These innovations improved the model’s ability to accurately focus on and identify relevant lesion features, marking a significant advancement in the field. Using the ISIC-2019 dataset, a publicly available resource containing dermoscopy images classified into eight diagnostic categories, we implemented preprocessing steps such as resizing, cleaning, and data balancing. Additionally, ImageNet transfer learning and Bayesian optimization were applied to refine the model. The inclusion of a soft-attention mechanism further enhanced the model’s capacity to identify patterns within lesion images. Our model exhibited outstanding performance on the ISIC-2019 dataset, achieving a sensitivity of 98.5%, specificity of 99.62%, precision of 97.42%, accuracy of 97.38%, an F1 score of 97.34%, and an area under the curve (AUC) of 0.99. These metrics underscored the model’s superior capability in accurate and reliable classification of pigmented skin lesions, surpassing current benchmarks and demonstrating significant advancements over existing methodologies.
- Conference Article
2
- 10.1109/bibe50027.2020.00125
- Oct 1, 2020
This study reports results of a pilot study, in which pigmented skin lesions are automatically classified into four classes: benign, dysplastic nevus with mild atypia, dysplastic nevus with severe atypia, and melanoma. The pilot study enrolled subjects from dermatology clinic at Baylor University Medical Center at Dallas from June 2016 to August 2017. 30 high-quality dermoscopic images were randomly selected from an image bank of 96 to obtain a statistically balanced dataset. Melanoma samples were histologically verified. A dermoscopy-based automated image analyzer with quaternary classification of pigmented skin lesions was proposed. The image analyzer automatically extracts five lesion features, most used in clinical practise, applying an active contour, and pairwise classification employing six Support Vector Machines. The pairwise accuracy of classifications are reported between 92% and 94% and used to determine the corresponding confidence intervals. Through the pairwise classification results maximum hits and acyclic tree decisions were utilized to reach the final classification of a lesion. Using leave-one-out validation, accuracy of the quaternary dermoscopy-based image analyzer were determined as 90% using the histopathologic diagnoses as the ground truth. This novel, dermoscopy-based image classifier accurately classifies pigmented skin lesions small data-sets into benign, two types of dysplastic nevi, and malignant lesions. To the best of our knowledge, there is no other automated skin lesion classification framework, in the literature, which distinguishes between different nevus lesion.
- Research Article
36
- 10.1016/s2589-7500(23)00130-9
- Sep 27, 2023
- The Lancet Digital Health
Diagnosis of skin cancer requires medical expertise, which is scarce. Mobile phone-powered artificial intelligence (AI) could aid diagnosis, but it is unclear how this technology performs in a clinical scenario. Our primary aim was to test in the clinic whether there was equivalence between AI algorithms and clinicians for the diagnosis and management of pigmented skin lesions. In this multicentre, prospective, diagnostic, clinical trial, we included specialist and novice clinicians and patients from two tertiary referral centres in Australia and Austria. Specialists had a specialist medical qualification related to diagnosing and managing pigmented skin lesions, whereas novices were dermatology junior doctors or registrars in trainee positions who had experience in examining and managing these lesions. Eligible patients were aged 18-99 years and had a modified Fitzpatrick I-III skin type; those in the diagnostic trial were undergoing routine excision or biopsy of one or more suspicious pigmented skin lesions bigger than 3 mm in the longest diameter, and those in the management trial had baseline total-body photographs taken within 1-4 years. We used two mobile phone-powered AI instruments incorporating a simple optical attachment: a new 7-class AI algorithm and the International Skin Imaging Collaboration (ISIC) AI algorithm, which was previously tested in a large online reader study. The reference standard for excised lesions in the diagnostic trial was histopathological examination; in the management trial, the reference standard was a descending hierarchy based on histopathological examination, comparison of baseline total-body photographs, digital monitoring, and telediagnosis. The main outcome of this study was to compare the accuracy of expert and novice diagnostic and management decisions with the two AI instruments. Possible decisions in the management trial were dismissal, biopsy, or 3-month monitoring. Decisions to monitor were considered equivalent to dismissal (scenario A) or biopsy of malignant lesions (scenario B). The trial was registered at the Australian New Zealand Clinical Trials Registry ACTRN12620000695909 (Universal trial number U1111-1251-8995). The diagnostic study included 172 suspicious pigmented lesions (84 malignant) from 124 patients and the management study included 5696 pigmented lesions (18 malignant) from the whole body of 66 high-risk patients. The diagnoses of the 7-class AI algorithm were equivalent to the specialists' diagnoses (absolute accuracy difference 1·2% [95% CI -6·9 to 9·2]) and significantly superior to the novices' ones (21·5% [13·1 to 30·0]). The diagnoses of the ISIC AI algorithm were significantly inferior to the specialists' diagnoses (-11·6% [-20·3 to -3·0]) but significantly superior to the novices' ones (8·7% [-0·5 to 18·0]). The best 7-class management AI was significantly inferior to specialists' management (absolute accuracy difference in correct management decision -0·5% [95% CI -0·7 to -0·2] in scenario A and -0·4% [-0·8 to -0·05] in scenario B). Compared with the novices' management, the 7-class management AI was significantly inferior (-0·4% [-0·6 to -0·2]) in scenario A but significantly superior (0·4% [0·0 to 0·9]) in scenario B. The mobile phone-powered AI technology is simple, practical, and accurate for the diagnosis of suspicious pigmented skin cancer in patients presenting to a specialist setting, although its usage for management decisions requires more careful execution. An AI algorithm that was superior in experimental studies was significantly inferior to specialists in a real-world scenario, suggesting that caution is needed when extrapolating results of experimental studies to clinical practice. MetaOptima Technology.
- Front Matter
26
- 10.1046/j.0926-9959.2001.00313.x
- Sep 1, 2001
- Journal of the European Academy of Dermatology and Venereology
Laser tissue interaction in epidermal pigmented lesions.
- Research Article
33
- 10.1046/j.1468-3083.2002.00470.x
- Jul 1, 2002
- Journal of the European Academy of Dermatology and Venereology
Epiluminescence microscopy (ELM) (dermoscopy, dermatoscopy) is a technique for non-invasive diagnosis of pigmented skin lesions that improves the diagnostic performance of dermatologists. Little is known about the possible influence of associated clinical features on the reliability of dermoscopic diagnosis during in vivo examination. To compare diagnostic performance of in vivo dermoscopy (combined clinical and dermoscopic examination) with that of dermoscopy performed on photographic slides (pure dermoscopy). This case series comprised 256 pigmented skin lesions consecutively identified as suspicious or equivocal during examination in a general dermatological clinic. Clinical examination and in vivo dermoscopy were performed before excision by two trained dermatologists. The same observers carried out dermoscopy on photographic slides at a later time, and these three diagnostic classifications were reviewed together with the histological findings for the individual lesions. This was carried out in a university hospital. In vivo dermoscopy performed better than dermoscopy on photographic slides for classification of pigmented skin lesions compared with histological diagnosis, and both performed better than general clinical diagnosis. In vivo dermoscopic diagnosis of melanoma showed 98.1% sensitivity, 95.5% specificity and 96.1% diagnostic accuracy while dermoscopic diagnosis of melanoma on photographic slides was less reliable with 81.5% sensitivity, 86.7% specificity and 85.2% diagnostic accuracy. In particular, diagnosis of melanoma based on photographic slides led to nine false negative cases (three in situ, six invasive; thickness ranges 0.2-1.5 mm). In vivo dermoscopy, i.e. combined clinical and dermoscopic examination, is more reliable than dermoscopy on photographic slides. In clinical practice, therefore, in vivo dermoscopy cannot be considered independent from associated clinical characteristics of the lesions, which help the trained observer to reach a more precise classification. This may have implications on the reliability of ELM diagnosis made by an observer not fully trained in the clinical diagnosis of pigmented skin lesions or by a remote observer during digital ELM teleconsultation.
- Research Article
48
- 10.1002/(sici)1097-0142(19960715)78:2<252::aid-cncr10>3.0.co;2-v
- Jul 15, 1996
- Cancer
The diagnosis of melanomas at an early stage is associated with improved survival, so the recognition of changes in pigmented skin lesions over time is important. We have developed a computer imaging system with the aim of assisting clinicians in differentiating early melanomas from benign pigmented skin lesions. The objective of this study was to investigate the system's reliability over time in measuring diagnostic characteristics of pigmented skin lesions, including their color, size, shape, and distinctness of boundary. We captured video images of 5 lesions, all larger than 2 mm in greatest dimension, on each of 66 Australian adolescents on 2 occasions approximately 1 month apart. Features extracted by computer image analysis included area, perimeter, and regularity of outline of the lesions, the mean and standard deviation of reflectance at red, green, and blue wavelengths, and the mean and standard deviation of the gradients of red, green, and blue reflectance at the lesion boundary. All measurements showed moderate to high reliability (intraclass correlation coefficients 0.66-0.94), except for the standard deviations of the color gradients, whose reliability improved to moderate levels (0.68-0.71) when the mean of 5 lesions was considered. For most outcomes, reasonable within subject reliability was achieved when five lesions per subject were measured. These results, in combination with previous work demonstrating the reasonable ability of this computer imaging system to discriminate between malignant melanomas and other pigmented lesions, indicates that the system has the potential to become a useful tool for clinicians in following people with pigmented lesions over time to detect early malignant changes.
- Conference Article
- 10.5753/wvc.2021.18909
- Nov 22, 2021
Skin cancer is one of the most common types of cancer in Brazil and its incidence rate has increased in recent years. Melanoma cases are more aggressive compared to nonmelanoma skin cancer. Machine learning-based classification algorithms can help dermatologists to diagnose whether skin lesion is melanoma or non-melanoma cancer. We compared four convolutional neural networks architectures (ResNet-50, VGG16, Inception-v3, and DenseNet-121) using different training strategies and validation methods to classify seven classes of skin lesions. The experiments were executed using the HAM10000 dataset which contains 10,015 images of pigmented skin lesions. We considered the test accuracy to determine the best model for each strategy. DenseNet-121 was the best model when trained with fine-tuning and data augmentation, 90% (k-fold crossvalidation). Our results can help to improve the use of machine learning algorithms for classifying pigmented skin lesions.
- Research Article
86
- 10.3390/jcm9061662
- Jun 1, 2020
- Journal of Clinical Medicine
Skin cancer is one of the most common forms of cancer worldwide and its early detection its key to achieve an effective treatment of the lesion. Commonly, skin cancer diagnosis is based on dermatologist expertise and pathological assessment of biopsies. Although there are diagnosis aid systems based on morphological processing algorithms using conventional imaging, currently, these systems have reached their limit and are not able to outperform dermatologists. In this sense, hyperspectral (HS) imaging (HSI) arises as a new non-invasive technology able to facilitate the detection and classification of pigmented skin lesions (PSLs), employing the spectral properties of the captured sample within and beyond the human eye capabilities. This paper presents a research carried out to develop a dermatological acquisition system based on HSI, employing 125 spectral bands captured between 450 and 950 nm. A database composed of 76 HS PSL images from 61 patients was obtained and labeled and classified into benign and malignant classes. A processing framework is proposed for the automatic identification and classification of the PSL based on a combination of unsupervised and supervised algorithms. Sensitivity and specificity results of 87.5% and 100%, respectively, were obtained in the discrimination of malignant and benign PSLs. This preliminary study demonstrates, as a proof-of-concept, the potential of HSI technology to assist dermatologists in the discrimination of benign and malignant PSLs during clinical routine practice using a real-time and non-invasive hand-held device.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.