A comparison of 12 machine learning models developed to predict ploidy, using a morphokinetic meta-dataset of 8147 embryos.

Thomas Bamford,Arri Coomarasamy,Sue Montgomery,Alison Campbell,Rima K Dhillon-Smith,Christina Easter,Rachel Smith,Amy Barrie

doi:10.1093/humrep/dead034

Abstract

Are machine learning methods superior to traditional statistics in predicting blastocyst ploidy status using morphokinetic and clinical biodata? Mixed effects logistic regression performed better than all machine learning methods for ploidy prediction using our dataset of 8147 embryos. Morphokinetic timings have been demonstrated to be delayed in aneuploid embryos. Machine learning and statistical models are increasingly being built, however, until now they have been limited by data insufficiency. This is a multicentre cohort study. Data were obtained from 8147 biopsied blastocysts from 1725 patients, treated from 2012 to 2020. All embryos were cultured in a time-lapse system at nine IVF clinics in the UK. A total of 3004 euploid embryos and 5023 aneuploid embryos were included in the final verified dataset. We developed a total of 12 models using four different approaches: mixed effects multivariable logistic regression, random forest classifiers, extreme gradient boosting, and deep learning. For each of the four algorithms, two models were created, the first consisting of 22 covariates using 8027 embryos (Dataset 1) and the second, a dataset of 2373 embryos and 26 covariates (Dataset 2). Four final models were created by switching the target outcome from euploid to aneuploid for each algorithm (Dataset 1). Models were validated using internal-external cross-validation and external validation. All morphokinetic variables were significantly delayed in aneuploid embryos. The likelihood of euploidy was significantly increased the more expanded the blastocyst (P < 0.001) and the better the trophectoderm grade (P < 0.01). Univariable analysis showed no association with ploidy status for morula or cleavage stage fragmentation, morula grade, fertilization method, sperm concentration, or progressive motility. Male age did not correlate with the percentage of euploid embryos when stratified for female age. Multinucleation at the two-cell or four-cell stage was not associated with ploidy status. The best-performing model was logistic regression built using the larger dataset with 22 predictors (F1 score 0.59 for predicting euploidy; F1 score 0.77 for predicting aneuploidy; AUC 0.71; 95% CI 0.67-0.73). The best-performing models using the algorithms from random forest, extreme gradient boosting, and deep learning achieved an AUC of 0.68, 0.63, and 0.63, respectively. When using only morphokinetic predictors the AUC was 0.61 for predicting ploidy status, whereas a model incorporating only embryo grading was unable to discriminate aneuploid embryos (AUC = 0.52). The ploidy prediction model's performance improved with increasing age of the egg provider. The models have not been validated in a prospective study design or yet been used to determine whether they improve clinical outcomes. This model may aid decision-making, particularly where pre-implantation genetic testing for aneuploidy is not permitted or for prioritizing embryos for biopsy. No specific funding was sought for this study; university funds supported the first author. A.Ca. is a minor shareholder of participating centres. N/A.

Full Text