Abstract
Artificial Intelligence (AI) is transforming healthcare, with Large Language Models (LLMs) like ChatGPT offering novel capabilities. This study evaluates ChatGPT’s performance in interpreting and responding to the UK Royal College of Obstetricians and Gynaecologists MRCOG Part One and Two examinations – international benchmarks for assessing knowledge and clinical reasoning in Obstetrics and Gynaecology. We analysed ChatGPT’s domain-specific accuracy, the impact of linguistic complexity, and its self-assessment confidence. A dataset of 1824 MRCOG questions was curated, ensuring minimal prior exposure to ChatGPT. ChatGPT’s responses were compared to known correct answers, and linguistic complexity was assessed using token counts and Type-Token ratios. Confidence scores were assigned by ChatGPT and analysed for self-assessment accuracy. ChatGPT achieved 72.2% accuracy on Part One and 50.4% on Part Two, performing better on Single Best Answer (SBA) than Extended Matching (EMQ) Questions. The findings highlight the potential and significant limitations of ChatGPT in clinical decision-making in women’s health.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.