Abstract Study question Are algorithms more effective than clinicians/embryologists at predicting pregnancy outcomes in the context of Assisted Reproductive Technology (ART) following the transfer of a single blastocyst? Summary answer Evaluation of a pregnancy prediction algorithm, trained on local data and assessed by two ART professionals, yielded an accuracy comparable to a coin toss. What is known already Artificial Intelligence’s ability to synthesize multifaceted embryological and clinical data and process extensive datasets holds promise for elevating IVF success rates through data-driven, individualized treatments. Refining embryo selection through deep learning and computer vision, individualizing treatment protocols, predicting oocyte retrieval after Controlled Ovarian Hyperstimulation (COH) and ultimately predicting Live Birth (LB) following embryo transfer are one of the early promises of this technology. Study design, size, duration A retrospective analysis was conducted on 475 stimulation cycles that resulted in a single day-5 blastocyst transfer (fresh or frozen, classical IVF or ICSI) from January 2015 to January 2022 at the Poissy/Saint-Germain-En-Laye hospital, France. The study encompassed 32 clinical and biological parameters, ranging from general characteristics of patients to controlled ovarian hyperstimulation (COH) details and embryo grading. Exclusively, embryos fertilized with autologous oocytes and sperm were considered, excluding any donor-fertilized embryos. Participants/materials, setting, methods Preliminary testing with Machine Learning algorithms (XGboost, LightGBM, Random Forest (RF), Support Vector Machine (SVM), Multi-layer Perceptron (MLP)) were conducted before selecting XGboost model for this study. Data was split into a training (n = 331) and testing (n = 144) set. Pregnancy outcome predictions (binary) were compared between the model, a clinician, and an embryologist. Feature extraction was conducted, and both professionals were asked to give the top 5 most relevant parameters used to make the pregnancy predictions. Main results and the role of chance The study analyzed 475 stimulation cycles, including 362 frozen and 114 fresh transfers. Participants’ mean age was 33, with a Body Mass Index (BMI) of 24.9. The average Antral Follicle Count (AFC) was 16.8, and the mean starting gonadotropin dose was 225 UI, yielding 8.3 MII oocytes on average. 36 patients were active smokers. Preliminary AUC results showed XGboost leading (AUC = 0.57) compared to LightGBM (AUC = 0.56), Random Forest (AUC = 0.56), SVM (AUC = 0.49), and MLP (AUC = 0.47) in predicting pregnancy outcomes. After hyperparameter optimization via GridSearch, the XGboost model enhanced its AUC to 0.61, outperforming the embryologist (AUC = 0.58) and the gynecologist (AUC = 0.44). Top predictive features identified after feature extraction included endometrial thickness at or before ovulation trigger, AMH levels at patient admission, AFC, BMI, and age. The gynecologist primarily prioritized age, BMI, embryo grading (TE and ICM), and the number of mature oocytes retrieved. The embryologist, valued embryo grading (TE and ICM) foremost, followed by age, smoking status, BMI, and COH duration. Limitations, reasons for caution The study’s limited dataset due to dynamic practices and difficulty maintaining curated data over the years, necessitates caution. The selected 32 parameters from 762 available reflect typical incomplete datasets in such settings, highlighting the importance of careful interpretation and potential need for broader data integration to enhance representativeness and reliability. Wider implications of the findings The findings advocate for developing ART-specific AI, integrating multimodal data and novel parameters, to enhance live birth predictions. This approach underscores the complex synergy between algorithmic precision and clinical expertise in ART, highlighting the stochastic nature of pregnancy prediction and the potential of tailored, data-driven strategies to improve outcomes. Trial registration number not applicable
Read full abstract