Abstract Study question Can advanced machine learning applied to the preoperative assessment predict the testicular sperm extraction outcome in azoospermic context and how many patients are required? Summary answer Despite encouraging results (AUC = 92.0%, sensitivity = 83.9% and specificity = 84.2%), integrating new biomarkers would probably be more relevant than enrolling additional patients. What is known already Testicular sperm extraction (TESE) is an essential therapeutic tool for the male infertility management and is often the “last hope” before gamete donation for these patients. However, it is an invasive procedure and is successful in up to 50%. Until now, no model is sufficiently powerful to accurately predict the success of sperm retrieval in TESE. Among the few models already developed, the findings are highly disparate despite having common input data (preoperative assessment). Moreover, only few types of machine learning models and procedures have been investigated. Performances were mostly capped despite the inclusion sometimes of more than 1000 patients. Study design, size, duration Data of 175 patients who underwent TESE between 2012 and 2021 were retrospectively analyzed. The performances of a wide range of preprocessing methods and machine learning models (state-of-the-art methods in machine learning) we explored, evaluated, and compared. The objective was to predict the presence or absence of spermatozoa, using 17 parameters (clinical, hormonal, genetic, history) from the preoperative assessment. The study protocol was approved by a local ethics committee (IRB CER-2021-041). Participants/materials, setting, methods After data preprocessing (standardization…), Machine Learning models (Bayesian Naive Classification, logistic regression, k-nearest neighbor classifier, support vector machine, random forests, GradientBoosting and XGBoost) and Deep Learning models were tested. The validation procedure consisted of splitting the dataset into a training set and test set. Beyond the standard metrics (sensitivity, specificity, AUC-ROC), the identification of the most relevant variables and the learning curve to determine the optimal patient number to be included were performed. Main results and the role of chance At least one live spermatozoon was found in the testicular tissue of 104 (59.4%) patients (positive TESE) out of 175. The best performing model (Random Forest with appropriate preprocessing) obtained the following results on the test set: AUC = 92.0%, sensitivity = 83.9% and specificity = 84.2%, leading to an efficient tool, which gives additional and more relevant information than the different variables taken separately. Inhibin B, FSH and history of cryptorchidism were the variables with the most discriminating power. However, a plateau in the model performance was observed (beyond 110 patients), whatever the approach or the preprocessing used. A trend curve shows that beyond 110 patients, no improvement can be observed and cast doubt about the power of the traditional preoperative parameters assessed before TESE. The classic preoperative assessment can probably not fully predict the TESE outcomes. Further work is needed to be enhance with new hypothesis and the use of new biomarkers to be integrated into the models. Limitations, reasons for caution The main limitation was the monocentric design and the use of retrospective data. Wider implications of the findings Machine learning models can provide the basis for an enhanced decision support system tool in the context of azoospermia. Indefinitely increasing the number of participants is not likely to be the solution: further hypotheses and biomarkers integration into the models will probably be necessary to improve performance. Trial registration number not applicable