Abstract

The purposes of the study was to describe the degree of agreement between geriatricians with the answers given by an AI tool (ChatGPT) in response to questions related to different areas in geriatrics, to study the differences between specialists and residents in geriatrics in terms of the degree of agreement with ChatGPT, and to analyse the mean scores obtained by areas of knowledge/domains. An observational study was conducted involving 126 doctors from 41 geriatric medicine departments in Spain. Ten questions about geriatric medicine were posed to ChatGPT, and doctors evaluated the AI's answers using a Likert scale. Sociodemographic variables were included. Questions were categorized into five knowledge domains, and means and standard deviations were calculated for each. 130 doctors answered the questionnaire. 126 doctors (69.8% women, mean age 41.4 [9.8]) were included in the final analysis. The mean score obtained by ChatGPT was 3.1/5 [0.67]. Specialists rated ChatGPT lower than residents (3.0/5 vs. 3.3/5 points, respectively, P < 0.05). By domains, ChatGPT ​​scored better (M: 3.96; SD: 0.71) in general/theoretical questions rather than in complex decisions/end-of-life situations (M: 2.50; SD: 0.76) and answers related to diagnosis/performing of complementary tests obtained the lowest ones (M: 2.48; SD: 0.77). Scores presented big variability depending on the area of knowledge. Questions related to theoretical aspects of challenges/future in geriatrics obtained better scores. When it comes to complex decision-making, appropriateness of the therapeutic efforts or decisions about diagnostic tests, professionals indicated a poorer performance. AI is likely to be incorporated into some areas of medicine, but it would still present important limitations, mainly in complex medical decision-making.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call