Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease.

Christopher J Warren,Victoria S Edmonds,Nicolette G Payne,Sandeep Voletti,Sarah Y Wu,Jennakay Colquitt,Hossein Sadeghi-Nejad,Nahid Punjani

doi:10.1093/sexmed/qfae055

Abstract

Despite direct access to clinicians through the electronic health record, patients are increasingly turning to the internet for information related to their health, especially with sensitive urologic conditions such as Peyronie's disease (PD). Large language model (LLM) chatbots are a form of artificial intelligence that rely on user prompts to mimic conversation, and they have shown remarkable capabilities. The conversational nature of these chatbots has the potential to answer patient questions related to PD; however, the accuracy, comprehensiveness, and readability of these LLMs related to PD remain unknown. To assess the quality and readability of information generated from 4 LLMs with searches related to PD; to see if users could improve responses; and to assess the accuracy, completeness, and readability of responses to artificial preoperative patient questions sent through the electronic health record prior to undergoing PD surgery. The National Institutes of Health's frequently asked questions related to PD were entered into 4 LLMs, unprompted and prompted. The responses were evaluated for overall quality by the previously validated DISCERN questionnaire. Accuracy and completeness of LLM responses to 11 presurgical patient messages were evaluated with previously accepted Likert scales. All evaluations were performed by 3 independent reviewers in October 2023, and all reviews were repeated in April 2024. Descriptive statistics and analysis were performed. Without prompting, the quality of information was moderate across all LLMs but improved to high quality with prompting. LLMs were accurate and complete, with an average score of 5.5 of 6.0 (SD, 0.8) and 2.8 of 3.0 (SD, 0.4), respectively. The average Flesch-Kincaid reading level was grade 12.9 (SD, 2.1). Chatbots were unable to communicate at a grade 8 reading level when prompted, and their citations were appropriate only 42.5% of the time. LLMs may become a valuable tool for patient education for PD, but they currently rely on clinical context and appropriate prompting by humans to be useful. Unfortunately, their prerequisite reading level remains higher than that of the average patient, and their citations cannot be trusted. However, given their increasing uptake and accessibility, patients and physicians should be educated on how to interact with these LLMs to elicit the most appropriate responses. In the future, LLMs may reduce burnout by helping physicians respond to patient messages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease.

Abstract

Talk to us

Similar Papers

More From: Sexual medicine

Lead the way for us

Journal: Sexual medicine	Publication Date: Aug 13, 2024
License type: CC BY 4.0

Similar Papers

Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.
Christopher J Warren ... Mitchell R Humphreys
The Prostate | VOL. -
Christopher J Warren, et. al.Christopher J Warren ... Mitchell R Humphreys
08 Nov 2024
The Prostate | VOL. -

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
Nitin Srinivasan ... Kamran Samakar
Surgical endoscopy | VOL. 38
Nitin Srinivasan, et. al.Nitin Srinivasan ... Kamran Samakar
12 Mar 2024
Surgical endoscopy | VOL. 38

Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
Ivan Civettini ... Paola Perfetti
Blood | VOL. 142
Ivan Civettini, et. al.Ivan Civettini ... Paola Perfetti
02 Nov 2023
Blood | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease.

Abstract

Talk to us

Similar Papers

More From: Sexual medicine