Abstract Study question How do three artificial intelligence-based embryo selection models that have evolved over the years work? Summary answer The three automatic scores were positively associated with clinical outcomes, and the iDAScore v2 showed the best performance in conventional treatments with patient oocytes. What is known already Artificial intelligence (AI) models have been introduced in in vitro fertilization laboratories in recent years as an adjunct to clinical decision-making. The most initial function of the AI involved guiding embryologists in the annotations of embryonic events. This was followed by the development of embryo selection models based on machine learning, requiring annotations (i.e., KIDScore). Finally, the latest in this field are models based on deep learning, which analyze raw time-lapse video (i.e., iDAScore). In this study, the evolution of three AI-based models have been analyzed on the same large set of embryos. Study design, size, duration This single-center study includes 6,737 patients who underwent in vitro fertilization treatments for 6 consecutive years. A total of 7,722 cycles were analyzed, resulting in 70,456 embryos cultured in EmbryoScope® time-lapse systems. Embryos were routinely evaluated and selected according to conventional morphology (ASEBIR criteria) by senior embryologists. Retrospectively, the embryos were scored by three AI models from 1 to 9,9: KIDScore D5 v3 (n = 32,784), iDAScore v1 (n = 68,440) and iDAScore v2 (n = 68,471). Participants/materials, setting, methods Automatic embryo scores were compared with conventional morphology, ploidy, and clinical outcomes for single blastocyst transfers. Then, we performed multivariate logistic regression analysis (confounding factors: oocyte origin, donated-autologous; type of embryo transfer, fresh-frozen; oocyte age; patient body mass index; culture strategy, individual-group; day of embryo transfer, fifth-sixth day of embryo development) in different patient populations (PGT-A, oocyte donation program and conventional treatments with patient oocytes). Finally, the performance (AUC) was calculated for comparison. Main results and the role of chance The mean of the three scores increased as embryos had better morphological grade*. Regarding ploidy (euploid vs. aneuploid): 5,62 ± 1,78 vs. 5,00 ± 1,72 for KIDScore* (n = 6,580); 7.59 ± 1.61 vs. 6.92 ± 1.75 for iDAScorev1* (n = 7,089); and 6.31 ± 2.53 vs. 5.07 ± 2.55 for iDAScorev2* (n = 7,082). Regarding implantation (implanted vs. non-implanted): 6.24 ± 2.01 vs. 5.42 ± 2 for KIDScore* (n = 9,681); 8.36 ± 1.37 vs. 7.76 ± 1.70 for iDAScorev1* (n = 10,079); and 6.83 ± 2.18 vs. 5.75 ± 2.54 for iDAScorev2* (n = 10,068). Regarding live birth (positive vs. negative): 6.29 ± 2.01 vs. 5.51 ± 2.01 for KIDScore* (n = 9,668); 8,39 ± 1,34 vs.7,83 ± 1,68 for iDAScorev1* (n = 10,065); and 6.88 ± 2.15 vs. 5.87 ± 2.52 for iDAScorev2* (n = 10,054). In general, the multivariate analysis showed statistically significant odds ratio for the three models in predicting implantation and live birth (all patient subpopulations)*. Regarding oocyte donation program: the AUCs for predicting implantation were 0.636 [0.621−0.650] for KIDScore, 0.638 [0.624−0.652] for iDAScorev1, and 0.636 [0.622−0.650] for iDAScorev2; and the AUCs for live birth prediction were 0.630 [0.615−0.644] for KIDScore, 0.629 [0.615−0.643] for iDAScorev1, and 0.635 [0.621−0.649] for iDAScorev2. Regarding treatments with patient oocytes: the AUCs for predicting implantation were 0.667 [0.644−0.689] for KIDScore, 0.674 [0.652−0.696] for iDAScorev1, and 0.686 [0.664−0.708] for iDAScorev2; and the AUCs for live birth prediction were 0.664 [0.642−0.687] for KIDScore, 0.669 [0.647−0.692] for iDAScorev1, and 0.686 [0.664−0.708] for iDAScorev2. *p<0.001 Limitations, reasons for caution A major limitation of our study is its retrospective nature. Although ours is the largest external validation ever performed with an unselected ICSI population, its single-center design should be considered for the universal application of the models. Also, specific culture conditions should be addressed when considering the generalized application. Wider implications of the findings Our results showed a positive association between automatic scores and the success of IVF. Despite observing similar AUCs between the three embryo selection models, the use of the most advanced automatic algorithms should improve workflow, standardize the process between laboratories and allow embryologists to spend their time on other tasks. Trial registration number PI21/00283
Read full abstract