Abstract Study question How often are fully automated follicle counts and measurements modified in expert review? Summary answer Automatic follicle annotations provided by an artificial intelligence platform in regular operation of a clinic were edited 3.27% of the time. What is known already Ovarian follicle counting is frequently performed, time-consuming, and subject to noticeable inter-observer variability; as such, it is well-suited to automatizing with artificial intelligence. FOLLISCAN (MIM Fertility) is a software platform that automatically annotates follicles on 2D or 3D ultrasound cine videos, with exact outlines and measurements, without any manual pre-processing. It has undergone several retrospective tests of its precision and recall in detecting ovarian follicles, accuracy of measurements, and time-savings compared to manual counting Study design, size, duration The platform was integrated with a clinic’s existing picture archiving and medical record systems. Ultrasound scans of ovaries were performed as part of regular infertility treatment in two IVF centers. Scans were made by 3 experts in a period from November to December 2023, automatically sent to the platform, analyzed, and then immediately visualized for review by the person performing the ultrasound, in place of manual annotation. Participants/materials, setting, methods The study included 294 cine videos from 147 examinations of 101 patients, with 4347 follicles in total (14.7 per ovary on average, from 2mm). The platform allows users to modify proposed measurements and introduce new follicle annotations. After confirming the review, results were automatically sent back to the clinic’s system to be used in medical decisions just as manual annotations would be. Main results and the role of chance Among all videos the average number of editions (follicle additions, modifications, or deletions) was 0.48 (CI: 0.37-0.61). In total, 142 follicles out of 4347 (3.27%, CI: 2.74-3.80) were edited: 66 follicles were added (1.52%, CI: 1.15-1.88), 26 were modified, (0.60%, CI: 0.37-0.83 ), and 50 were deleted (1.15%, CI: 0.83–1.47). Out of the 151 edited follicles, 13 (9.2%) had 2–5 mm, 29 (20.4%) had 5-10 mm, 51 (35.9%) had 10-15 mm, 42 (29.6%) had 15-20 mm, and 7 (4.9%) had 20 mm or more in diameter. Among follicles of 15 mm or more in size, 49 were edited. Out of 294 videos, 87 (29.6%, 95% Confidence Interval: 24.4-34.6) were edited in review in any way. Limitations, reasons for caution The study was limited in scope and did not track patients until success rates in IVF treatments could be observed. A multi-center randomized control trial could compare pregnancy rates for treatments performed with and without the platform. Wider implications of the findings An integrated platform allows for easy review, while significantly reducing the time spent on follicle counting in a real-world setting. Automated annotations result in consistent, reliable, and quick measurements, with the number of expert modifications smaller than the inter-observer reliability reported in previous studies. Trial registration number Project support was provided by the Polish National Center for Research and Development no. POIR.01.01.01-00-1634/20-00 and ERC Consolidator Grant TUgbOAT no. 772346. This study was conducted following the approval of the research protocol by the review board of the Regional Medical Chamber in Gdańsk (approval no. KB – 51/22).