Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations

Yung Lee,Matthew Kroh,David Jin,Arshia Javidan,Dennis Hong,James Jung,Tyler Mckechnie,Andrew T Strong,Jerry T Dang,Léa Tessier,Sarah Malone,Thomas Shin

doi:10.1016/j.soard.2024.03.011

Abstract

BackgroundThe formulation of clinical recommendations pertaining to bariatric surgery is essential in guiding healthcare professionals. However, the extensive and continuously evolving body of literature in bariatric surgery presents considerable challenge for staying abreast of latest developments and efficient information acquisition. Artificial intelligence (AI) has the potential to streamline access to the salient points of clinical recommendations in bariatric surgery. ObjectivesThe study aims to appraise the quality and readability of AI-chat-generated answers to frequently asked clinical inquiries in the field of bariatric and metabolic surgery. SettingRemote. MethodsQuestion prompts inputted into AI large language models (LLMs) and were created based on pre-existing clinical practice guidelines regarding bariatric and metabolic surgery. The prompts were queried into 3 LLMs: OpenAI ChatGPT-4, Microsoft Bing, and Google Bard. The responses from each LLM were entered into a spreadsheet for randomized and blinded duplicate review. Accredited bariatric surgeons in North America independently assessed appropriateness of each recommendation using a 5-point Likert scale. Scores of 4 and 5 were deemed appropriate, while scores of 1–3 indicated lack of appropriateness. A Flesch Reading Ease (FRE) score was calculated to assess the readability of responses generated by each LLMs. ResultsThere was a significant difference between the 3 LLMs in their 5-point Likert scores, with mean values of 4.46 (SD .82), 3.89 (.80), and 3.11 (.72) for ChatGPT-4, Bard, and Bing (P < .001). There was a significant difference between the 3 LLMs in the proportion of appropriate answers, with ChatGPT-4 at 85.7%, Bard at 74.3%, and Bing at 25.7% (P < .001). The mean FRE scores for ChatGPT-4, Bard, and Bing, were 21.68 (SD 2.78), 42.89 (4.03), and 14.64 (5.09), respectively, with higher scores representing easier readability. ConclusionsLLM-based AI chat models can effectively generate appropriate responses to clinical questions related to bariatric surgery, though the performance of different models can vary greatly. Therefore, caution should be taken when interpreting clinical information provided by LLMs, and clinician oversight is necessary to ensure accuracy. Future investigation is warranted to explore how LLMs might enhance healthcare provision and clinical decision-making in bariatric surgery.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations

Abstract

Talk to us

Similar Papers

More From: Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery

Lead the way for us

Journal: Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery	Publication Date: Mar 24, 2024
License type: cc-by-nc-nd

Similar Papers

Retrospect of twenty years of development and prospect of bariatric and metabolic surgery in China

Chinese Journal of Digestive Surgery | VOL. 18

20 Sep 2019
Chinese Journal of Digestive Surgery | VOL. 18

Employing Large Language Models for Surgical Education: An In-depth Analysis of ChatGPT-4
Adrian Hang Yue Siu ... Alexander Chi Wang Siu
Journal of Medical Education | VOL. 22
Adrian Hang Yue Siu, et. al.Adrian Hang Yue Siu ... Alexander Chi Wang Siu
17 Oct 2023
Journal of Medical Education | VOL. 22

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... W Nick Street
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... W Nick Street
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions
Yung Lee ...
Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery | VOL. 20
Yung Lee, et. al.Yung Lee ...
01 May 2024
Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations

Abstract

Talk to us

Similar Papers

More From: Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery