Abstract Study question Can machine learning algorithms provide an individualized starting dose of gonadotropins to maximize the number of oocytes retrieved in patients undergoing ovarian stimulation ? Summary answer Methods of Policy Learning based on Causal Inference and Machine Learning optimize starting doses of gonadotropins with a substantial gain of oocytes under appropriate treatment. What is known already Despite attempts toward standardization, there has been no consensus about establishing the optimal starting dose of gonadotropins to be used in ovarian stimulation. In the real world, medical practice differs considerably across countries and even among fertility clinics nationally. The machine learning community has proposed a variety of models for doing this, but with limited comparisons, causal inference can provide the statistical tools to enable the best model to be chosen and allow its explicability. Study design, size, duration In a retrospective single-center study, we reviewed the data of 11,436 cycles of ovarian stimulation for IVF between 2012 and 2023. In order to estimate the causal effect of the starting dose of gonadotropins on the number of oocytes retrieved, we selected a relevant subset of confounding covariates – including age, Antral Follicle Count (AFC), Anti-Müllerian Hormone (AMH) and oestradiol (E2) rates. Participants/materials, setting, methods We used five different Machine Learning models : Linear Regression, Regression Forest, Super Learner (SL), Multi-Arm Causal Forest (MACF) and a model based on the Targeted Maximum Likelihood Estimation (TMLE). We employed these models for Policy Learning through an Outcome Regression Modeling approach. We compared the gain of oocytes obtained through the optimal policy of our different models and analyzed the variables which played a crucial role in determining the optimal dose. Main results and the role of chance We use two approaches to compare our models, the first one gives the expected average gain of oocytes under the optimal policy : in this case, the Linear Regression is the best with an average gain of 0.90 oocytes, followed by the MACF (0.80), the SL (0.58), the TMLE (0.44) and the Regression Forest (0.18). Considering the lack of heterogeneity of our database we use a second approach, an Augmented Inverse Propensity Score based value, to compare with more robustness the performance of the optimal policy of each model. In this case the MACF (an explainable model specifically designed for causal inference) is the best with a gain of 0.50 oocytes, followed by the SL (0.42), the Linear Regression (0.37), the TMLE (0.23) and the Regression Forest (0.18). In general, our models recommend lower doses, for instance 20% of the patients are recommended by the MACF to reduce their initial doses of gonadotropins by on average 100 IU. The most significant covariates in the models are the AFC, the BMI, and the rate of AMH, with p-values < 0.001 in the Linear Regression that has a R-squared of 0.418. Limitations, reasons for caution Full IVF history was not included in patients who had previous IVF attempts, but only data from the last previous attempt was considered. Also, the study is monocentric and more heterogeneity will be present considering other centers. Wider implications of the findings This study needs to be completed by data from different centers in France and around the world to bring diversity to the patients’ data and to the treatment prescriptions in order to improve external validity. Algorithm recommendations and their adherence need to be tested prospectively. Trial registration number not applicable
Read full abstract