Exploring the role of AI-driven chatbots in patient care: a critical evaluation amidst healthcare staff shortages

R Olszewski,J Brzezinski,K Watros,M Manczak,J Owoc,K Jeziorski

doi:10.1093/eurheartj/ehae666.3495

Abstract

Abstract Background Amidst the backdrop of staff shortages in healthcare systems, patients and their families are increasingly turning to chatbots, powered by Large Language Models (LLMs), for information about their medical conditions. These AI-driven chatbots, capable of generating human-like responses across a broad range of topics, have become a prevalent tool in the healthcare landscape. Given the proliferation of these chatbots, it is crucial to evaluate the quality and accuracy of the responses they provide. Methods We selected five freely accessible chatbots (Bard, Microsoft Copilot, PiAI, ChatGPT, and ChatSpot) for our study. These chatbots were posed questions spanning three medical fields: cardiology, cardio-oncology, and cardio-rheumatology. The responses generated by the chatbots were then compared against established guidelines from the European Society of Cardiology, American Academy of Dermatology, and American Society of Clinical Oncology. In addition to the content, the readability of the responses was evaluated using four different readability scales: the Flesch Reading Scale, Gunning Fog Scale Level, Flesch-Kincaid Grade Level, and Dale-Chall Score. To assess the accuracy of the responses in accordance with the medical guidelines, two independent medical professionals rated them on a 3-point Likert scale (0 - incorrect, 1 - partially correct or incomplete, 2 - correct). This allowed us to gauge the compliance of the chatbot responses with the medical guidelines. Results Results: In our study, we posed a total of 45 questions to each of the chatbots. Out of the five chatbots, Microsoft Copilot, PiAI, and ChatGPT were able to respond to all the questions. The length of the responses varied, with PiAI providing the shortest average response length of 7.26 words, and Bard providing the longest at 18.9 words. In terms of readability, the Flesch Reading Ease Scale scores ranged from 17.67 (ChatGPT) to 39.34 (Bard), indicating the relative complexity of the responses. The Flesch-Kincaid Grade Level, which reflects the academic grade level required to comprehend the text, ranged from 14.02 (PiAI) to 15.97 (ChatGPT). The Gunning Fog Scale Level, another measure of readability, varied from 15.77 (Bard) to 19.73 (ChatGPT). Lastly, the Dale-Chall Score, which assesses the understandability of the text, ranged from 10.24 (Bard) to 11.87 (ChatGPT). These results highlight the variability in the readability and complexity of responses generated by different chatbots. Readability analysis is presented in table 1. Conclusion This study indicates that chatbots vary in length, quality and readability. They answer each question in their way, based on data they have pulled from the network. Our data suggests that people who want information from a chatbot need to be careful and verify the answers they get.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring the role of AI-driven chatbots in patient care: a critical evaluation amidst healthcare staff shortages

Abstract

Talk to us

Similar Papers

More From: European Heart Journal

Lead the way for us

Similar Papers

Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study
Robert Olszewski ... Jakub Brzeziński
International Journal of Medical Informatics | VOL. 190
Robert Olszewski, et. al.Robert Olszewski ... Jakub Brzeziński
19 Jul 2024
International Journal of Medical Informatics | VOL. 190

Readability, quality, and timeliness of patient online health resources for urticaria
Devea R De ... Vivian Y Shi
Journal of the American Academy of Dermatology | VOL. 86
Devea R De, et. al.Devea R De ... Vivian Y Shi
02 May 2021
Journal of the American Academy of Dermatology | VOL. 86

Online Patient Information From Radiation Oncology Departments May Be too Complex for the General Population
S.A Rosenberg ... R Kimple
International Journal of Radiation Oncology*Biology*Physics | VOL. 93
S.A Rosenberg, et. al.S.A Rosenberg ... R Kimple
17 Oct 2015
International Journal of Radiation Oncology*Biology*Physics | VOL. 93

Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.
Murat Tepe ... Emre Emekli
Cureus | VOL. 16
Murat Tepe, et. al.Murat Tepe ... Emre Emekli
09 May 2024
Cureus | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring the role of AI-driven chatbots in patient care: a critical evaluation amidst healthcare staff shortages

Abstract

Talk to us

Similar Papers

More From: European Heart Journal