Comparison of Chat GPT Versions in Informing Patients with Rotator Cuff Injuries

Ali Eray Günay,Alper Özer,Alparslan Yazıcı,Gökhan Sayer

doi:10.1016/j.jseint.2024.04.016

Abstract

BackgroundThe aim of this study is to evaluate whether ChatGPT can be recommended as a resource for informing patients planning rotator cuff repairs, and to assess the differences between ChatGPT 3.5 and 4.0 versions in terms of information content and readability. MethodsIn August 2023, 13 commonly asked questions by patients with rotator cuff disease were posed to Chat GPT-3.5 and Chat GPT-4 programs using different IP computers by three experienced surgeons in rotator cuff surgery. After converting the answers of both versions into text, the quality and readability of the answers were examined. ResultsThe average JAMA score for both versions was 0, and the average Discern score was 61.6. A statistically significant and strong correlation was found between ChatGPT 3.5 and 4.0 Discern scores. There was excellent agreement in Discern scores for both versions among the three evaluators. Chat GPT 3.5 was found to be less readable than 4.0. ConclusionsThe information provided by the ChatGPT conversational system was evaluated as of high quality, but there were significant shortcomings in terms of reliability due to the lack of citations. Despite the ChatGPT 4.0 version having higher readability scores, both versions were considered difficult to read.

Full Text