Pregnancy Loss: A Comparative Analysis of Information Delivery Between ACOG FAQs and ChatGPT [ID 2683594

Angelo Cadiente,Antonia F Oladipo,April Adams,Jamie Chen

doi:10.1097/01.aog.0001013748.64403.38

Abstract

INTRODUCTION: Pregnancy loss is a sensitive topic, and clear, reliable information is crucial. This study compares the readability and accuracy of ChatGPT's responses to ACOG's frequently asked questions (FAQ) sheets on the subject. METHODS: Within the five FAQ sheets under “Pregnancy Loss,” 66 questions were assessed using ChatGPT-3.5. Readability scores, ie, Flesch–Kincaid Reading Ease, Flesch–Kincaid Grade Level, Gunning Fog Score, Smog Index, Coleman Liau Index, and Automated Readability Index were computed for each response. The quality of responses were also graded by two maternal–fetal medicine specialists using a 1–4 scale, where 1 represents a comprehensive response and 4 indicates an incorrect response. Statistical analysis utilized a two-tailed t-test. A weighted Cohen’s kappa coefficient evaluated interrater reliability. RESULTS: ACOG attained better readability scores over ChatGPT across all six metrics with statistical significance (P<.001). Among grading, ACOG also outperformed ChatGPT in quality of information with a summative mean of 1.53 (SD 0.60) versus 1.85 (SD 0.90) with statistical significance (P<.001). Grading for ACOG possessed a Cohen's kappa coefficient of 0.493, implying moderate agreement between graders. Grading for ChatGPT possessed a Cohen's kappa coefficient of 0.744, implying substantial agreement between graders. CONCLUSION: The responses by ACOG to frequently asked questions on pregnancy loss were both clearer and more comprehensive than those of ChatGPT with consistent agreement between graders. Although ChatGPT can be a convenient source of medical queries, further improvements ought to be made to these large-language models to provide clear and accurate information regarding pregnancy loss to patients.

Full Text