Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.

Zichang Su,Kai Jin,Hongkang Wu,Ziyao Luo,Andrzej Grzybowski,Juan Ye

doi:10.1007/s40123-024-01066-y

Abstract

Cataracts are a significant cause of blindness. While individuals frequently turn to the Internet for medical advice, distinguishing reliable information can be challenging. Large language models (LLMs) have attracted attention for generating accurate, human-like responses that may be used for medical consultation. However, a comprehensive assessment of LLMs' accuracy within specific medical domains is still lacking. We compiled 46 commonly inquired questions related to cataract care, categorized into six domains. Each question was presented to the LLMs, and three consultant-level ophthalmologists independently assessed the accuracy of their responses on a three-point scale (poor, borderline, good) and their comprehensiveness on a five-point scale. A majority consensus approach established the final rating for each response. Responses rated as 'Poor' were prompted for self-correction and reassessed. For accuracy, ChatGPT-4o and Google Bard both achieved average sum scores of 8.7 (out of 9), followed by ChatGPT-3.5, Bing Chat, Llama 2, and Wenxin Yiyan. In consensus-based ratings, ChatGPT-4o outperformed Google Bard in the 'Good' rating. For completeness, ChatGPT-4o had the highest average sum score of 13.22 (out of 15), followed by Google Bard, ChatGPT-3.5, Llama 2, Bing Chat, and Wenxin Yiyan. Detailed performance data reveal nuanced differences in model capabilities. In the 'Prevention' domain, apart from Wenxin Yiyan, all other models were rated as 'Good'. All models showed improvement in self-correction. Bard and Bing improved 1/1 from 'Poor' to better, Llama improved 3/4, and Wenxin Yiyan improved 4/5. Our findings emphasize the potential of LLMs, particularly ChatGPT-4o, to deliver accurate and comprehensive responses to cataract-related queries, especially in prevention, indicating potential for medical consultation. Continuous efforts to enhance LLMs' accuracy through ongoing strategies and evaluations are essential.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.

Abstract

Talk to us

Similar Papers

More From: Ophthalmology and therapy

Lead the way for us

Journal: Ophthalmology and therapy	Publication Date: Nov 8, 2024
License type: CC BY-NC 4.0

Similar Papers

Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
Zhi Wei Lim ... Yih-Chung Tham
eBioMedicine | VOL. 95
Zhi Wei Lim, et. al.Zhi Wei Lim ... Yih-Chung Tham
23 Aug 2023
eBioMedicine | VOL. 95

The Application of Large Language Models in Gastroenterology: A Review of the Literature.
Marcello Maida ... Daryl Ramai
Cancers | VOL. 16
Marcello Maida, et. al.Marcello Maida ... Daryl Ramai
28 Sep 2024
Cancers | VOL. 16

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.
Leyao Wang ... Zhijun Yin
Journal of medical Internet research | VOL. 26
Leyao Wang, et. al.Leyao Wang ... Zhijun Yin
07 Nov 2024
Journal of medical Internet research | VOL. 26

Optimization of traditional methods for determining the similarity of project names and purchases using large language models
Aleksei Aleksandrovich Golikov ... Yuliya Danilova
Litera | VOL. -
Aleksei Aleksandrovich Golikov, et. al.Aleksei Aleksandrovich Golikov ... Yuliya Danilova
01 Apr 2024
Litera | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.

Abstract

Talk to us

Similar Papers

More From: Ophthalmology and therapy