Abstract Study question Is there a benefit of using outlier detection when developing an AI algorithm for predicting the number of oocytes retrieved in different trigger days? Summary answer Using an outlier detection as a pre-step for other algorithms helps to determine if a prediction\decision should be made for a specific scenario. What is known already Outlier detection is a process that helps in identifying unusual or unexpected observations in the data. Outliers can arise for a variety of reasons, such as: measurement errors, data entry mistakes, genuine rare events or data the model did not observe when training. In recent years multiple AI algorithms have been developed across a variety of areas in reproductive health (embryos, sperm, etc.), however not many publications have been made on the importance of implementing outlier detection when assisting doctors in clinical decisions. Study design, size, duration The data used for developing this algorithm consists of 9,618 antagonist protocol cycles performed in a large center serving over 50 physicians, between August 2017 and November 2022. The data was divided into three subsets, representing cases in which the physician performed the trigger 0, 1 or 2 days after a blood and ultrasound test day. Participants/materials, setting, methods Three outlier detection models were developed for a pre-developed trigger day selection algorithm. Each outlier model was developed for a different trigger day subset. Those models were built using Local Outlier Factor algorithm, an unsupervised method. Each model was applied to its specific data set that had been standardized and preprocessed using principal component analysis (PCA). Additionally, A rule-based outlier detection method was implemented, based on thresholds for different hormones. Main results and the role of chance The three outlier detection models were evaluated using train, validation, and test sets, identifying approximately 1.5%, 1.5%, and 2% of the data as outliers, respectively. Additionally, each model was evaluated using other datasets, for example, the "trigger today" outlier detection model was evaluated using "trigger tomorrow" and "trigger in two days" datasets, which resulted in ∼12% and ∼87% of the data being identified as outliers. To further assess the models, additional subsets were created for different trigger times, such as in 3 days, 4 days, 5 days, and 6 or more days. The results of the "trigger today" outlier detection model were highly accurate in identifying outliers, with at least 99.5% of the data being detected as outliers in all of the additional subsets. Instances with a closer relative trigger time will result in similar instances, therefore, detecting a lower percentage of outliers. Limitations, reasons for caution The outlier detection models were developed specifically for antagonist protocol cycles. Additionally, there may be rare cycles that occur infrequently and may not be well represented in the data, resulting in them being incorrectly identified as outliers, even though they are a legitimate scenario. Wider implications of the findings Outlier detection is a crucial aspect of AI algorithms to ensure accurate results. It recognizes cases that are outside of the model's training set as well as abnormal values and typos, thus avoiding predictions on untested cases. This is especially important when first deploying algorithms on diverse populations. Trial registration number Not applicable
Read full abstract