BackgroundAs artificial intelligence AI-supported applications become integral to web-based information-seeking, assessing their impact on healthy nutrition and weight management during the antenatal period is crucial. ObjectiveThis study was conducted to evaluate both the quality and semantic similarity of responses created by AI models to the most frequently asked questions about healthy nutrition and weight management during the antenatal period, based on existing clinical knowledge. MethodsIn this study, a cross-sectional assessment design was used to explore data from 3 AI models (GPT-4, MedicalGPT, Med-PaLM). We directed the most frequently asked questions about nutrition during pregnancy, obtained from the American College of Obstetricians and Gynecologists (ACOG) to each model in a new and single session on October 21, 2023, without any prior conversation. Immediately after, instructions were given to the AI models to generate responses to these questions. The responses created by AI models were evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale. Additionally, to assess the semantic similarity between answers to 31 pregnancy nutrition-related frequently asked questions sourced from the ACOG and responses from AI models we evaluated cosine similarity using both WORD2VEC and BioLORD-2023. ResultsMed-PaLM outperformed GPT-4 and MedicalGPT in response quality (mean = 3.93), demonstrating superior clinical accuracy over both GPT-4 (p = 0.016) and MedicalGPT (p = 0.001). GPT-4 had higher quality than MedicalGPT (p = 0.027).The semantic similarity between ACOG and Med-PaLM is higher with WORD2VEC (0.92) compared to BioLORD-2023 (0.81), showing a difference of +0.11. The similarity scores for ACOG–MedicalGPT and ACOG–GPT-4 are similar across both models, with minimal differences of −0.01. Overall, WORD2VEC has a slightly higher average similarity (0.82) than BioLORD-2023 (0.79), with a difference of +0.03. ConclusionsDespite the superior performance of Med-PaLM, there is a need for further evidence-based research and improvement in the integration of AI in healthcare due to varying AI model performances.
Read full abstract