Abstract Background Conversational artificial intelligence (AI) systems like ChatGPT have emerged as valuable assets in providing accessible information across various domains, including the healthcare system. The use of ChatGPT may contribute to better patient education and better general healthcare knowledge. However, there is a paucity of data on the reliability of responses generated by ChatGPT in the context of specific medical diagnoses. Methods We identified 12 frequently asked questions by patients about glenohumeral osteoarthritis. These questions were formulated in both English and German, using common and medical terms for the condition, thus creating four groups for evaluation. The questions were then presented to ChatGPT 3.5. The generated responses were evaluated for accuracy by four independent orthopedic and trauma surgery consultants using a Likert scale (0 = fully inaccurate to 4 = fully accurate). Results Although there were two questions in two groups, all questions across all versions were answered with good accuracy by ChatGPT 3.5. The highest score on the Likert scale was 3.9 for the group where questions were posed in English using the medical term “glenohumeral osteoarthritis.” The lowest score of 3.2 was for the group where questions were posed in English using the common term “shoulder arthrosis.” On average, questions in English received a score of 3.5 on the Likert scale, slightly higher than those in German, which received a score of 3.4. Conclusion ChatGPT 3.5 can already provide accurate responses to patients’ frequently asked questions on glenohumeral osteoarthritis. ChatGPT can therefore be a valuable tool for patient communication and education in the field of orthopedics. Further studies, however, have to be performed in order to fully understand the mechanisms and impact of ChatGPT in the field.
Read full abstract