ChatGPT, a sophisticated language model developed by OpenAI, has the potential to offer professional and patient-friendly support. We aimed to assess the accuracy and reproducibility of ChatGPT-4 in answering questions related to knowledge, management, and support within the field of reproductive medicine. ChatGPT-4 was used to respond to queries sourced from a domestic attending physician examination database, as well as to address both local and international treatment guidelines within the field of reproductive medicine. Each response generated by ChatGPT-4 was independently evaluated by a trio of experts specializing in reproductive medicine. The experts used four qualitative measures-relevance, accuracy, completeness, and understandability-to assess each response. We found that ChatGPT-4 demonstrated extensive knowledge in reproductive medicine, with median scores for relevance, accuracy, completeness, and comprehensibility of objective questions being 4, 3.5, 3, and 3, respectively. However, the composite accuracy rate for multiple-choice questions was 63.38%. Significant discrepancies were observed among the three experts' scores across all four measures. Expert 1 generally provided higher and more consistent scores, while Expert 3 awarded lower scores for accuracy. ChatGPT-4's responses to both domestic and international guidelines showed varying levels of understanding, with a lack of knowledge on regional guideline variations. However, it offered practical and multifaceted advice regarding next steps and adjusting to new guidelines. We analyzed the strengths and limitations of ChatGPT-4's responses on the management of reproductive medicine and relevant support. ChatGPT-4 might serve as a supplementary informational tool for patients and physicians to improve outcomes in the field of reproductive medicine.
Read full abstract