Abstract Study question Can ovarian follicle counting and volumetric measurement in ultrasound cine-loops be automated using deep learning methods? Summary answer A deep-learning model achieved an Area Under Precision-Recall Curve (AUC-PR) of 77.8% for identifying follicles of diverse sizes, in cine-loops from everyday clinical practice. What is known already The measurement of ovarian follicles via transvaginal ultrasound is an established procedure in infertility treatment. The high predictive value of antral follicle count (AFC) is utilized in hormonal dosing algorithms, scheduling of triggering and oocyte retrieval, as a criterion to assess hyperstimulation risk, etc. As it is repeatable and time-consuming, it is a good candidate for automatization. However, traditional methods fail to distinguish follicles from acoustically shadowed areas, ovarian cysts, or anechoic extraovarian structures. Existing approaches assume manual ovary outlining or 3D volume acquisition, which requires additional operator training. Moreover, existing evaluations rarely define criteria for correct follicle identification. Study design, size, duration A retrospective study was conducted on 331 ultrasound cine-loop videos from 100 patients (mean age 35 ± 5.6, AMH level 3.3 ± 2.9 ng/ml) undergoing an IVF or donor cycle between February and December 2021 at six IVF centers in Poland. For training the model, 1903 more cine-loops from 350 other patients were used. To reflect everyday clinical practice, there was no selection based on patient condition or video quality. Participants/materials, setting, methods Folliscan (MIM Solutions) is a model developed based on 3D neural networks architectures. It analyzes a cine-loop without any manual preprocessing and presents a list of follicles together with exact 3D outlines and confidence scores. A total of 24711 follicles were manually annotated and reviewed by sonography experts to ensure exhaustive enumeration. A detected follicle is considered correct if it sufficiently overlaps (Intersection over Union above 35%) with a single expert annotation. Main results and the role of chance The precision was 85.8% (95% Confidence Interval: 83.7–87.7) and recall was 74.2% (CI 71.7–76.6). The area under the precision-recall curve (AUC-PR) was 77.8%. We note that 19.7% of annotations were only added during review, indicating that even expert sonographers fail to annotate a certain number of follicles. For studies performed on days 7-12 of stimulation (N = 163 cine-loops), when only follicles ≥10mm were taken into account, accuracy was significantly higher: precision 95.4% (CI 93.0–96.8), recall 87.7% (CI 73.1–94.2). We observe that the smallest anechoic areas, about 1mm in diameter, often are not imaged in enough resolution to be unanimously classified by experts. Common reasons for errors in medium-sized follicles are poor image quality (acoustic shadowing, unclear boundaries between nearby follicles), ambiguous volumes (where two outlines can seem equally reasonable, but do not overlap sufficiently), non-convex follicle shapes (due to adjacent follicles). When measuring follicle diameters, the Mean Average Error (MAE) was 0.76mm, only slightly larger than the inter-observer MAE of 0.62mm (calculated on 32 cine-loops for which two experts independently measured all follicles). Moreover, 3D outlines enable a more physiological measurement than 2D diameters. Limitations, reasons for caution Follicle recognition is highly dependent on image quality and patient group. To contrast our model with manual methods, more cine-loops would need to be independently annotated by multiple experts. A future study on 3D acquisitions would be useful to compare with software that requires it, such as SonoAVC (GE Healthcare). Wider implications of the findings Automatizing follicle measurement can speed-up the process and diminish acquisition requirements (examination time, operator experience). Deep learning methods consider each follicle’s context, enabling more reliable recognition of ovarian structures. They have potential to increase the predictive value of follicle counting beyond what is already possible with human observers. Trial registration number N/A
Read full abstract