Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

Serhat Ermis,Ece Özal,Murat Karapapak,Ebrar Kumantaş,Sadık Altan Özal

doi:10.3928/01913913-20240911-05

Abstract

To assess the appropriateness and readability of responses provided by four large language models (LLMs) (ChatGPT-4, Claude 3, Gemini, and Microsoft Co-pilot) to parents' queries pertaining to retinopathy of prematurity (ROP). A total of 60 frequently asked questions were collated and categorized into six distinct sections. The responses generated by the LLMs were evaluated by three experienced ROP specialists to determine their appropriateness and comprehensiveness. Additionally, the readability of the responses was assessed using a range of metrics, including the Flesch-Kincaid Grade Level (FKGL), Gunning Fog (GF) Index, Coleman-Liau (CL) Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease (FRE) score. ChatGPT-4 demonstrated the highest level of appropriateness (100%) and performed exceptionally well in the Likert analysis, scoring 5 points on 96% of questions. The CL Index and FRE scores identified Gemini as the most readable LLM, whereas the GF Index and SMOG Index rated Microsoft Copilot as the most readable. Nevertheless, ChatGPT-4 exhibited the most intricate text structure, with scores of 18.56 on the GF Index, 18.56 on the CL Index, 17.2 on the SMOG Index, and 9.45 on the FRE score. This suggests that the responses demand a college-level comprehension. ChatGPT-4 demonstrated higher performance than other LLMs in responding to questions related to ROP; however, its texts were more complex. In terms of readability, Gemini and Microsoft Copilot were found to be more successful. [J Pediatr Ophthalmol Strabismus. 20XX;XX(X):XXX-XXX.].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

Abstract

Talk to us

Similar Papers

More From: Journal of pediatric ophthalmology and strabismus

Lead the way for us

Similar Papers

Readability of Patient-Reported Outcomes in Spine Surgery and Implications for Health Literacy.
Tariq Z Issa ... Gregory D Schroeder
Spine | VOL. 49
Tariq Z Issa, et. al.Tariq Z Issa ... Gregory D Schroeder
27 Jun 2023
Spine | VOL. 49

Readability and Suitability of Online Patient Education Materials for Glaucoma
Cole A Martin ... Eileen C Bowden
Ophthalmology Glaucoma | VOL. 5
Cole A Martin, et. al.Cole A Martin ... Eileen C Bowden
14 Mar 2022
Ophthalmology Glaucoma | VOL. 5

Semantics Matter: Cheiloschisis Web-Based Information Differs from Cleft Lip
Darren B Abbas ... Jennifer B L Parker
Journal of the American College of Surgeons | VOL. 235
Darren B Abbas, et. al.Darren B Abbas ... Jennifer B L Parker
17 Oct 2022
Journal of the American College of Surgeons | VOL. 235

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.
Benjamin J Behers ... Karen M Hamad
Cureus | VOL. 16
Benjamin J Behers, et. al.Benjamin J Behers ... Karen M Hamad
01 Jul 2024
Cureus | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

Abstract

Talk to us

Similar Papers

More From: Journal of pediatric ophthalmology and strabismus