Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.

Xu Liu,Chaoli Duan,Min-Kyu Kim,Lu Zhang,Eunjin Jee,Beenu Maharjan,Yuwei Huang,Dan Du,Xian Jiang

doi:10.2196/59273

Xu Liu, Chaoli Duan + Show 7 more

Open Access

https://doi.org/10.2196/59273

Copy DOI

Export

Save

Cite

Journal: JMIR medical informatics	Publication Date: Aug 6, 2024
Citations: 2	License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recent advancements in artificial intelligence (AI) and large language models (LLMs) have shown potential in medical fields, including dermatology. With the introduction of image analysis capabilities in LLMs, their application in dermatological diagnostics has garnered significant interest. These capabilities are enabled by the integration of computer vision techniques into the underlying architecture of LLMs. This study aimed to compare the diagnostic performance of Claude 3 Opus and ChatGPT with GPT-4 in analyzing dermoscopic images for melanoma detection, providing insights into their strengths and limitations. We randomly selected 100 histopathology-confirmed dermoscopic images (50 malignant, 50 benign) from the International Skin Imaging Collaboration (ISIC) archive using a computer-generated randomization process. The ISIC archive was chosen due to its comprehensive and well-annotated collection of dermoscopic images, ensuring a diverse and representative sample. Images were included if they were dermoscopic images of melanocytic lesions with histopathologically confirmed diagnoses. Each model was given the same prompt, instructing it to provide the top 3 differential diagnoses for each image, ranked by likelihood. Primary diagnosis accuracy, accuracy of the top 3 differential diagnoses, and malignancy discrimination ability were assessed. The McNemar test was chosen to compare the diagnostic performance of the 2 models, as it is suitable for analyzing paired nominal data. In the primary diagnosis, Claude 3 Opus achieved 54.9% sensitivity (95% CI 44.08%-65.37%), 57.14% specificity (95% CI 46.31%-67.46%), and 56% accuracy (95% CI 46.22%-65.42%), while ChatGPT demonstrated 56.86% sensitivity (95% CI 45.99%-67.21%), 38.78% specificity (95% CI 28.77%-49.59%), and 48% accuracy (95% CI 38.37%-57.75%). The McNemar test showed no significant difference between the 2 models (P=.17). For the top 3 differential diagnoses, Claude 3 Opus and ChatGPT included the correct diagnosis in 76% (95% CI 66.33%-83.77%) and 78% (95% CI 68.46%-85.45%) of cases, respectively. The McNemar test showed no significant difference (P=.56). In malignancy discrimination, Claude 3 Opus outperformed ChatGPT with 47.06% sensitivity, 81.63% specificity, and 64% accuracy, compared to 45.1%, 42.86%, and 44%, respectively. The McNemar test showed a significant difference (P<.001). Claude 3 Opus had an odds ratio of 3.951 (95% CI 1.685-9.263) in discriminating malignancy, while ChatGPT-4 had an odds ratio of 0.616 (95% CI 0.297-1.278). Our study highlights the potential of LLMs in assisting dermatologists but also reveals their limitations. Both models made errors in diagnosing melanoma and benign lesions. These findings underscore the need for developing robust, transparent, and clinically validated AI models through collaborative efforts between AI researchers, dermatologists, and other health care professionals. While AI can provide valuable insights, it cannot yet replace the expertise of trained clinicians.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

The rise of artificial intelligence: addressing the impact of large language models such as ChatGPT on scientific publications.
Tiing Leong Ang ... Kian Keong Poh
Singapore Medical Journal | VOL. 64
Tiing Leong Ang, et. al.Tiing Leong Ang ... Kian Keong Poh
30 Mar 2023
Singapore Medical Journal | VOL. 64

Association between the dermoscopic morphology of peripheral globules and melanocytic lesion diagnosis.
O Reiter ... N Kurtansky
Journal of the European Academy of Dermatology and Venereology | VOL. 35
O Reiter, et. al.O Reiter ... N Kurtansky
23 Dec 2020
Journal of the European Academy of Dermatology and Venereology | VOL. 35

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

The potential impact of ChatGPT in clinical and translational medicine.
Vivian Weiwen Xue ... Pinggui Lei
Clinical and Translational Medicine | VOL. 13
Vivian Weiwen Xue, et. al.Vivian Weiwen Xue ... Pinggui Lei
01 Mar 2023
Clinical and Translational Medicine | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics