Abstract

To the Editor: Confronted with increasing infertility worldwide, this decade has witnessed a sharp rise in the utilization of artificial reproduction technology (ART).[1] Nevertheless, the success rate of in vitro fertilization (IVF)/intracytoplasmic sperm injection (ICSI) depends on many factors. Moreover, patients receiving ART not only incur high expenses but also an increased risk of severe side effects, such as ovarian hyperstimulation syndrome, infection, and multiple pregnancies.[2] Consequently, the accurate prediction of ART outcomes has attracted tremendous interest. A considerable number of logistic regression models have been developed to predict ovarian stimulation,[3] pregnancy outcomes,[4] and adverse obstetric outcomes.[5] Although prediction models based on traditional statistical methods have been broadly applied, their clinical utility is hindered by their low predictive efficacy. Therefore, more accurate models to predict IVF/ICSI outcomes are needed. The accelerating development of computer technology heightened the popularization of artificial intelligence in medicine. These new machine learning methods are distinguished with their enhanced performance in comparison to the conventional methods. As one of the well-received machine learning techniques, eXtreme Gradient Boosting (XGBoost), has been gradually put into medical use and recognized for its remarkable capacity to mine data.[6–8] XGBoost, a decision-tree-based algorithm, proved to be the best algorithm for machine learning in a prediction competition hosted by Kaggle.com.[9] To better predict pregnancy outcomes and provide patient-tailored counseling and management for IVF/ICSI, we designed this study and selected XGBoost to build the prediction model. The aim of this study was twofold. First, we sought to develop a new prediction model based on the machine learning method to predict IVF/ICSI outcomes. Second, by comparing the performance of the machine learning model and the conventional logistic regression model, we attempt to determine the efficacy of the machine learning model. We retrospectively studied continuous patients’ data from the Peking Union Medical College Hospital, China, from July 2014 to March 2018. The patients were referred to the hospital for IVF/ICSI treatment with tubal or male infertility. All data were de-identified and relabeled with unique patients’ identifier codes. Exclusion criteria included donor oocyte or sperm use; patients with endometriosis or endocrine diseases, such as hyperandrogenism, diabetes, or thyroid diseases; and patients with missing data. The study was approved by the Institutional Review Board of Peking Union Medical College Hospital (S-K829). The clinical characteristics, sex hormone levels, and controlled ovarian hyperstimulation (COH) features of the IVF/ICSI cycles were used as model construction variables. Each patient's age, body mass index (BMI), infertility type, infertility duration, and COH protocol (gonadotropin-releasing hormone antagonist [GnRH-a] long protocol, GnRH-a ultra-long protocol, GnRH-a short protocol, GnRH antagonist protocol, and mini-stimulation protocol) were extracted from the medical records. The serum levels of the sex hormones (human follicle-stimulating hormone [FSH], estrogen [E2], luteinizing hormone [LH], prolactin [PRL], and testosterone [T]) were collected at two time points during the IVF/ICSI cycle (basal: 0; the second day after trigger: 1). Live birth, defined as giving birth to live newborns at >28 weeks of gestation, was the primary study outcome. We evaluated the cumulative outcome of each patient, including the first fresh cycle and all subsequent freeze-thaw cycles from the same ovarian stimulation and identified them as live and no live births. Statistical analyses were performed using R (http://www.R-project.org) and EmpowerStats software 2.2 (http://www.empowerstats.com, X&Y Solutions, Inc., Boston, MA). We built a conventional logistic regression model based on multivariate logistic regression analysis. A backward stepwise variable selection procedure with bootstrap resampling was applied to select the variables. And then the open-source XGBoost package was applied to analyze feature importance and acquired the probability threshold of live births. The predictive efficacy of the two models was evaluated by measuring their sensitivity, specificity, positive predictive value, and negative predictive value. Receiver operating curves and corresponding area under the curve (AUC) values of the two models were compared. Then, the calibration curves were assessed. The decision curve analysis (DCA) was performed to compare the clinical utility of the models. A total of 3012 patients were included in the model construction with 2101 IVF and 911 ICSI cases. The patients’ clinical characteristics, sex hormone levels, and COH features are listed as two groups (live birth and no live birth) in Supplementary Table 1, https://links.lww.com/CM9/A835. The top vital features selected by the XGBoost model were age, estrogen levels on the second day after trigger (E21), PRL levels on the second day after trigger (PRL1), basal LH levels (LH0), LH levels on the second day after trigger (LH1), E20 (basal estrogen levels), basal PRL levels (PRL0), total consumption of FSH. And the contribution of each feature to the model construction is illustrated in Supplementary Table 2, https://links.lww.com/CM9/A835 and Supplementary Figure 2, https://links.lww.com/CM9/A835. The features of the conventional model, selected by backward stepwise analysis, included age, secondary infertility, ICSI, No. of previous IVF, total consumption of FSH, FSH0, T0, PRL1, LH1, E21, P1, and T1 [Supplementary Table 3, https://links.lww.com/CM9/A835]. The predictive performance of the two models is presented in Supplementary Table 4, https://links.lww.com/CM9/A835 and Figure 1. Compared with the conventional logistic regression model, the XGBoost model had a higher AUC value, which represented a better discriminatory power (AUC [conventional]: 0.724, 95% confidence interval [CI] 0.708–0.741; AUC [XGBoost]: 0.901, 95% CI 0.890–0.912; P<0.001). Good calibration was observed for the probability of live birth in both models. The DCA curve of the XGBoost model was greater than that of the conventional model, indicating a larger net benefit of the XGBoost model.Figure 1: (A) The ROC curves of the two models. The XGBoost model presented a higher discriminatory power than the conventional one (AUC [conventional]: 0.724,95% CI 0.708–0.741; AUC [XGBoost]: 0.901,95% CI 0.890–0.912; P<0.001). (B) The DCA of the two models. The DCA curve of the XGBoost model was above that of the conventional model with a greater range on the axes, indicating that the net benefit of the XGBoost model was larger than the conventional model. (C) The calibration curves of the two models. Both models presented a good calibration of the probability of live birth. AUC: Area under the curve; CI: Confidence interval; DCA: Decision curve analysis; ROC: Receiver operating curves; XGBoost: eXtreme Gradient Boosting.In this study, to predict the IVF/ICSI outcomes for patients with tubal or male infertility, we built an XGBoost model that showed higher performance and better discriminative capacity than a conventional logistic regression model. According to the DCA results, the XGBoost model also had a larger net benefit than the conventional model, indicating its better clinical potential. For patients receiving ART, an accurate prediction of the success rate and subsequent individualized treatment strategies can be beneficial.[10] Numerous studies have been conducted to estimate the chance of live birth. The most popular model is the McLernon model, based on the UK national data of 184,269 complete cycles since 1991, providing a personalized estimated cumulative chance of live births before treatment and after the first fresh embryo transfer (C-index of 0.72–0.73). However, some imperative factors that might be potential predictors for live birth, such as anti-Müllerian hormone and BMI, were not included.[11,12] In addition, compared with the previous studies using traditional statistical methods, the XGBoots model in our study showed better performance with a high AUC value of 0.901. Among various machine learning algorithms, XGBoost is one of the most clinically recognized because of its remarkable predictive ability. XGBoost is a decision-tree-based algorithm that collects multiple decision trees to improve its classification capabilities.[13] The decision trees are developed by selecting the most discriminative features from the feature candidate pool, allowing the classifier to interact directly with the features. Hence, XGBoost can assemble models with a weak predictive ability or missing data and demonstrate excellent prediction competence to solve complicated problems.[14] Similar to our study, others have constructed prediction models with XGBoost and have confirmed its higher clinical value compared to the traditional methods.[15] Qiu et al[16] used XGBoost algorithm for personalized prediction of live births for IVF/ICSI patients, with an AUC of 0.73. Amini et al[17] used different machine learning approaches to predict the probability of successful delivery and suggested that random forests had the best performance (AUC = 0.81). Compared with our study, none of the aforementioned studies verified the superiority of the machine learning models over conventional prediction models. Although the better predictive performance of XGBoost has been demonstrated, some limitations of this study must be highlighted. This study's retrospective design was a major limitation, preventing the exclusion of all potential biases. In addition, patients were recruited from a single-center, which reduced its generalizability. Finally, the lack of external verification limits its clinical application. Hence, further studies should be implemented in larger cohorts with external validation. To conclude, we developed a prediction model using XGBoost, a machine learning algorithm, to predict the live birth rate of IVF/ICSI patients based on their clinical characteristics, sex hormone levels, and COH features. Compared with the model constructed using conventional multivariate logistic regression, the XGBoost model showed a higher discriminative ability and net benefit, indicating its potential clinical value. Conflicts of interest None.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call