Abstract Study question Can automatic measurement of follicles be used to predict the number of oocytes retrieved as accurately as manual physicians’ reports? Summary answer Automated follicular annotations generated by an artificial intelligence platform have similar predictive ability of the number of retrieved and mature oocytes as physician reports. What is known already Ovarian follicle counting is a time-consuming, frequently performed assessment, which has significant inter-observer variability. It’s a perfect candidate for automation using artificial intelligence. The main purpose of this monitoring is to determine the optimal day for trigger administration which maximises the number of retrieved oocytes and increases the chance of pregnancy. This decision typically depends on follicle sizes and hormone levels, while also considering the unique conditions of each patient. Research in this area is ongoing, but it primarily focuses on follicle sizes as measured by physicians, rather than by automated systems. Study design, size, duration This study included 589 in vitro cycles of 545 patients performed in 5 centers in Poland between 2019 and 2021. The 719 examinations that were done at most 3 days before the administration of the trigger were analyzed. Manual reports from follicle measurements have been collected, together with 2D cine loop scans (videos) of the ovaries, and final counts of retrieved oocytes and mature (stage MII) oocytes. The Folliscan artificial intelligence platform provided automatic measurements. Participants/materials, setting, methods Follicles in each ovary were grouped by size, starting from 8mm, then every 2mm up to 24mm, and one group for follicles larger than 24mm. The number of follicles in each group, with the number of days until the trigger shot, was used as features in a linear regression model to predict the number of retrieved oocytes, or MIIs. Two datasets were created: one based on physician reports, and the other on automatic measurements. Main results and the role of chance Ten-fold cross-validation was performed. The model based on manual measurements achieved an R² of 0.62 (Confidence Interval: 0.56–0.68) for predicting the number of retrieved oocytes and 0.51 (CI: 0.45–0.57) for MIIs. In comparison, the model based on automatic annotations achieved an R² of 0.63 (CI: 0.57–0.69) for oocytes and 0.50 (0.43–0.56) for MIIs. The mean average errors (MAE) in the predicted number of oocytes and MIIs were 3.12 (CI: 2.88–3.36) and 2.54 (CI: 2.36–2.72), respectively, for manual measurements. They amounted to 3.09 (CI: 2.85–3.33) and 2.56 (CI: 2.38–2.73) for automatic annotations. Two one-sided tests for equivalence were conducted on the mean average error, with a margin set at 0.2. The results yielded a p-value of 0.026 for oocytes errors and 0.0023 for MIIs errors. Limitations, reasons for caution The study had a limited scope, focusing on research from 5 in vitro clinics. A linear regression model was used, comparing only the potential of automatic measurements with manual measurements. Further development of the model is needed to achieve better accuracy. Wider implications of the findings Manual follicular measurements and automated measurements provide similar predictive values in predicting the number of retrieved and mature oocytes. Automated measurements can be used during stimulation monitoring, offering a reliable and efficient alternative to manual methods. However, clinical studies focusing on interpretable metrics are needed to further validate these findings. Trial registration number Polish National Center for Research and Development no. POIR.01.01.01-00-1634/20-00 and ERC Consolidator Grant TUgbOAT no. 772346.