Abstract Study question Is there a significant difference in predicting pregnancy outcomes among iDAScore 1.0, KIDScoreD5, and Gardner Score in elective blastocyst transfers? Summary answer Contrary to previous studies, iDAScore v1.0 does not demonstrate a significant Receiver Operating Characteristic (ROC) curve advantage over the traditional Scores in predicting pregnancy outcomes. What is known already iDAScore 1.0, a deep neural network integrated into the EmbryoScope system, and KIDscoreD5 have shown potential in ML-driven embryo assessment, potentially easing the workload of embryologists. While recent studies have highlighted their superior performance in predicting pregnancy outcomes, our study offers a contrasting perspective. Study design, size, duration This retrospective analysis encompassed 1762 embryos from 1112 cycles with 1424 fresh and 338 frozen single embryo transfer cycles conducted from November 2018 to September 2023 across three IVF clinics in New Zealand. No PGT cases were included. All embryos that fulfilled the defined study criteria and transferred were included in this analysis. An extended Gardner Scoring was used at our clinics: A, B, C and X, where X being poorest grades, get discarded. Participants/materials, setting, methods Retrospectively computed KID and iDAScores were analyzed alongside the pregnancy prediction accuracy of conventional grading. Gardner scores were converted to a ranked ordinal value with ‘AA’ (5), ‘AB’ and ‘BA’ at 4, ‘BB’ at 3, and both ‘AC’, ‘BC’ as 2 and ‘CC’ as 1 among the age groups; <35, 35-37, 38-40, 41-42, >42. Differences in overall predictive capacity were compared via AUC curves within age groups via paired Delong tests. P-values reported are nominal. Main results and the role of chance The study found no statistically significant differences in AUC between any of the 3 tests examined even before adjusting for multiple comparisons. Notably, in 35-37 years olds (n = 495), AUCs were 0.63 for iDAScore, 0.62 for KIDScore, and 0.61 for Gardner criteria, showing a robust ability to predict pregnancy across all methods. Similar patterns were observed in all other age groups, with no significant disparities in pregnancy prediction models (P-value > 0.05). Limitations, reasons for caution Given its retrospective nature, this study has inherent limitations. A prospective, controlled study would provide more definitive insights. Wider implications of the findings ML-based models are innovative, but they do not significantly exceed the predictive ability of traditional embryo assessment across a range of patient ages. Such models are not a panacea yet, but a data tool to increase reproducibility that might have over-promised and under delivered on clinical impact. Trial registration number not applicable