Abstract Study question Can an interpretable deep learning model successfully predict the number of mature oocytes retrieved after ovarian hyperstimulation and provide the rationale for these predictions? Summary answer An interpretable deep learning model can successfully predict the number of mature oocytes retrieved and provide a rationale for these predictions. What is known already A number of recent studies have used classical machine learning models in order to predict the number of mature oocytes retrieved in IVF cycles, including XGBoost, RandomForests and even basic deep learning models such as feed forward networks. Although these studies predicted the number of mature oocytes with a relatively high performance, the employed methods lack insights into the rationale governing their predictions. This lack of interpretability poses an important limitation when applying these models in a clinical setting. Study design, size, duration This study was a retrospective analysis of a dataset of 6,430 ovarian stimulation cycles performed at a single center in the USA, aimed at building a deep learning based, interpretable model to predict the number of mature oocytes retrieved after ovarian hyperstimulation. Participants/materials, setting, methods An interpretable deep learning model was developed to predict the number of mature oocytes retrieved. The model was evaluated on an out-of-sample test set and compared to baseline models used in prior studies such as XGBoost and Linear Regression. Main results and the role of chance The model successfully predicted the retrieved oocyte count, with a mean absolute error (MAE) deviation of ± 3.14 oocytes from the ground truth. The model outperformed the performance of models used in previous studies such as XGBoost and Linear Regression which achieved a MAE of ± 3.16 and ±3.2 oocytes respectively on the test dataset. Parameter significance was measured by calculating how much parameters aided in the final prediction. The parameters found to be most significant in predicting the number of retrieved mature oocytes were (in order): total number of follicles, number of follicles size 12mm-15mm and number of follicles size 16-17mm. Parameters that were found to be of lower significance but still indicative were follicles larger than 18 mm and estradiol on trigger day. Surprisingly, age and basal FSH were of little significance. Most notably, the number of follicles in the 12mm -15mm bin were much more indicative than larger follicles of size 16-17mm and much more than follicles larger than 18mm. Furthermore, the model generates a transparent breakdown of the prediction for each patient, including a heat map and bar graph highlighting the factors influencing the decision making process at every step, creating a completely transparent prediction process. Limitations, reasons for caution The model was developed on training data from a single clinic, therefore requiring further data collection from diverse global locations to ensure the model’s ability to generalize. Wider implications of the findings This study demonstrates better oocyte prediction accuracy and transparency in the prediction process, surpassing previous models while also building trust. These advancements not only improve the safety of Artificial Intelligence tools for optimizing stimulation protocols but also contribute to reducing regulatory concerns surrounding their deployment in clinical settings. Trial registration number not applicable