Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma.

Shivam Pandya,Tyler Wilson,Zin Htway,Tamir E Bresler,Manabu Fujita

doi:10.1177/00031348241269430

Abstract

Artificial Intelligence (AI) has emerged as a promising tool in the delivery of health care. ChatGPT-4.0 (OpenAI, San Francisco, California) and Llama 2 (Meta, Menlo Park, CA) have each gained attention for their use in various medical applications. This study aims to evaluate and compare the effectiveness of ChatGPT-4.0 and Llama 2 in assisting with complex clinical decision making in the diagnosis and treatment of thyroid carcinoma. We reviewed the National Comprehensive Cancer Network® (NCCN) Clinical Practice Guidelines for the management of thyroid carcinoma and formulated up to 3 complex clinical questions for each decision-making page. ChatGPT-4.0 and Llama 2 were queried in a reproducible manner. The answers were scored on a Likert scale: 5) Correct; 4) correct, with missing information requiring clarification; 3) correct, but unable to complete answer; 2) partially incorrect; 1) absolutely incorrect. Score frequencies were compared, and subgroup analysis was conducted on Correctness (defined as scores 1-2 vs 3-5) and Accuracy (scores 1-3 vs 4-5). In total, 58 pages of the NCCN Guidelines® were analyzed, generating 167 unique questions. There was no statistically significant difference between ChatGPT-4.0 and Llama 2 in terms of overall score (Mann-Whitney U-test; Mean Rank = 160.53 vs 174.47, P = 0.123), Correctness (P = 0.177), or Accuracy (P = 0.891).[Formula: see text]. ChatGPT-4.0 and Llama 2 demonstrate a limited but substantial capacity to assist with complex clinical decision making relating to the management of thyroid carcinoma, with no significant difference in their effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma.

Abstract

Talk to us

Similar Papers

More From: The American surgeon

Lead the way for us

Similar Papers

Independent thinking in the diagnosis and treatment of differentiated thyroid carcinoma

-

25 Feb 2020
25 Feb 2020

Clinical significance and cost-benefit analysis of serum calcitonin assay in diagnosis and treatment of medullary thyroid carcinoma
...
Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery | VOL. 54
, et. al. ...
07 Jul 2019
Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery | VOL. 54

From Bytes to Best Practices: Tracing ChatGPT-3.5's Evolution and Alignment With the National Comprehensive Cancer Network® Guidelines in Pancreatic Adenocarcinoma Management.
Tamir E Bresler ... Manabu Fujita
The American surgeon | VOL. 90
Tamir E Bresler, et. al.Tamir E Bresler ... Manabu Fujita
26 Apr 2024
The American surgeon | VOL. 90

Pediatric Neck Masses
Gabrielle Geddes ... Mark M Butterly
Pediatrics In Review | VOL. 34
Gabrielle Geddes, et. al.Gabrielle Geddes ... Mark M Butterly
01 Mar 2013
Pediatrics In Review | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Decoding the NCCN Guidelines With AI: A Comparative Evaluation of ChatGPT-4.0 and Llama 2 in the Management of Thyroid Carcinoma.

Abstract

Talk to us

Similar Papers

More From: The American surgeon