Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients

Hüseyin Şan,Özkan Bayrakcı,Berkay Çağdaş,Mustafa Serdengeçti,Engin Alagöz

doi:10.1016/j.remnie.2024.500021

Hüseyin Şan, Özkan Bayrakcı + Show 3 more

https://doi.org/10.1016/j.remnie.2024.500021

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

PurposeSearching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients. MethodsBasic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL). ResultsThe mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatmens were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512(95% CI 0.296: 0.704), 0.695(95% CI 0.518: 0.829), 0.687(95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (p < 0.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753(95% CI 0.602: 0.863), 0.812(95% CI 0.686: 0.899), 0.804(95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (p < 0.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4. ConclusionAlthough the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using 'prompt engineering' may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients

Abstract

Published Version

Talk to us

Similar Papers

More From: Revista Española de Medicina Nuclear e Imagen Molecular (English Edition)

Lead the way for us

Journal: Revista Española de Medicina Nuclear e Imagen Molecular (English Edition)	Publication Date: May 29, 2024
Citations: 4

Similar Papers

Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes
N Aydinbelge-Dizdar ... K Dizdar
Revista Española de Medicina Nuclear e Imagen Molecular (English Edition) | VOL. -
N Aydinbelge-Dizdar, et. al.N Aydinbelge-Dizdar ... K Dizdar
01 Sep 2024
Revista Española de Medicina Nuclear e Imagen Molecular (English Edition) | VOL. -

Hidradenitis Suppurativa: A Cross-sectional Study of Content Quality on TikTok
Raquel Wescott ... Lingchen Wang
SKIN The Journal of Cutaneous Medicine | VOL. 7
Raquel Wescott, et. al.Raquel Wescott ... Lingchen Wang
20 May 2023
SKIN The Journal of Cutaneous Medicine | VOL. 7

Online Health Information for Penile Prosthesis Implants Lacks Quality and Is Unreadable to the Average US Patient.
Benjamin Plambeck ... Brittany E Wordekemper
Cureus | VOL. 15
Benjamin Plambeck, et. al.Benjamin Plambeck ... Brittany E Wordekemper
26 Jan 2023
Cureus | VOL. 15

Quality and Readability of Accessible Facial Feminization Literature: Where Can We Improve?
David P Alper ... Joshua Z Glahn
Annals of Plastic Surgery | VOL. 90
David P Alper, et. al.David P Alper ... Joshua Z Glahn
01 Jun 2023
Annals of Plastic Surgery | VOL. 90

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients

Abstract

Published Version

Talk to us

Similar Papers

More From: Revista Española de Medicina Nuclear e Imagen Molecular (English Edition)