Abstract

Is overall performance of an AI-based embryo evaluation model improved by including discarded embryos during training compared to using only transferred embryos with known implantation data (KID)? A data set of 14,643 KID embryos with known fetal heart (4,337 FH+ and 10,307 FH-) incubated for at least 4 days was obtained from 18 different clinics. The data set also included 23,674 embryos either cryopreserved or transferred with unknown FH and 101,153 embryos that were discarded and thus not cryopreserved or transferred. Two 3D convolutional deep learning models were trained on 85% of the data set. The first model (KID-o) was trained only on KID embryos. The second model (KID-d) was trained on both KID embryos and discarded embryos, with discarded embryos pseudo-labelled as FH-. The KID-d model training used oversampling of FH+ embryos to ensure equal distribution of FH+ and FH- during training. The models were evaluated with an internal validation data set based on the remaining 15% of the data. In addition, an external validation data set with 1,125 KID embryos, 6,327 embryos either cryopreserved or transferred with unknown implantation data, and 9,728 discarded embryos was obtained from American Hospital, Turkey. The main goal of embryo evaluation is to rank embryos to determine the order of transfer. This task was evaluated by calculating the area under the curve (AUC) for KID embryos. Evaluation was based solely on clinical outcome. A second goal is to categorize embryos into useable (i.e. transferred or cryopreserved) or discarded embryos. This task was evaluated by finding the threshold where at least 95% of the useable embryos were classified as useable. At this 95% sensitivity threshold, the specificity was calculated. Thus, this evaluation only concerned if embryos were used or discarded and was not related to clinical outcome. The ranking AUCs of KID embryos by the two models were nearly identical. Thus, the AUCs for the KID-o and KID-d model, respectively, were 0.667 and 0.670 for the internal validation, and 0.715 and 0.714 for the external validation. However, the specificities for categorization of useable embryos were significantly (p<0.001 – McNemar’s test) higher for the KID-d than the KID-o model. Thus, the specificities for the KID-o and KID-d model, respectively, were 0.48 and 0.82 for the internal validation, and 0.50 and 0.84 for the external validation. The results showed that the performance for ranking pre-selected transferred KID embryos was not negatively affected by including discarded embryos in the training. However, for categorizing embryos into useable or discarded embryos, the model performed significantly better with the inclusion of discarded embryos. Thus, overall, the best performance was observed if discarded embryos were included in the training.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.