P-225 Trustworthy AI algorithm for embryo ranking

S Deluga-Białowarczuk,B Stankiewicz,I Martynowicz,T Gilewicz,P Sieczyński,P Sankowski,W Kuczyński,R Milewski,P Pawlik,P Wygocki,H Kompanowski,M Siennicki

doi:10.1093/humrep/dead093.583

Abstract

Abstract Study question Deep-learning algorithms are known to be non-robust: can the variability and inconsistency of AI algorithms be reduced in embryo selection? Summary answer We reduced the variability of algorithms (measured on different tasks like rotations and brightness changes) by 86% while preserving their quality. What is known already Deep-learning methods are generally known to be non-robust, i.e., decisions change with even slight modification of input data. Current solutions for embryo scoring are not robust - for example rotating the input image results in a different score in most solutions on the market. Despite this fact and expressed concerns of embryologists, there are no other publications focusing on the problem of variance in AI solutions used in IVF. Most of the publications measure accuracy, sensitivity, specificity, and ROC AUC; there are no variance metrics. Study design, size, duration The data-set was collected within multiple clinics using various devices. It contains 34,821 embryos (4,510 were transferred with known pregnancy results), represented by time-lapse videos or images. This gives 3,290,481 frames of embryos at various maturity levels. From the data-set 925 randomly selected embryos were chosen as a test set. The frames were modified by methods that are not supposed to change the results of the algorithm. We measured the variability of the scores given by our algorithm. Participants/materials, setting, methods We have considered seven different modifications of images that should not influence embryo scoring: • Rotations (10 different angles); • Brightness and Contrast modifications; • Substitutions of Frames (from time-lapse monitoring taken from a 2 hours interval); • Blur (Generalised Normal filter); • Gaussian Noise; • Gaussian Blur; • Sharpening. We used several techniques to reduce variance of our deep neural network model (architecture commonly used for embryo selection): • Ensemble (of different models in cross validation); • Test time augmentation (TTA); • Robust training. Main results and the role of chance In order to measure the variance we have used the following method. First, the scores are stretched to the standard uniform distribution. In other words we look in which percentile the score lies. This way the range of the scores are normalised thus the variance can be compared. Second, we train the EMBROAID model on the augmented data that includes all the above modifications. Third, we compute the variance of the normalised scores on the test set. The mean variance dropped by 86% (0.0055 to 0.0008) across all measured input modifications. The individual drops in the variance on measured input modifications: Rotations: 77% (0.009 -&gt; 0.002), Brightness and Contrast: 81% (0.0036 -&gt; 0.0007), Substitution of Frames: 76% (0.0076 -&gt; 0.0019), Blur 94% (0.012 -&gt; 0.0008), Gaussian Noise: 96% (0.0049 -&gt; 0.0002), Gaussian Blur: 95% (0.0052 -&gt; 0.0003), Sharpening: 77% (0.0015 -&gt; 0.0003). The significance was tested with Wilcoxon Rank Sum Test giving the p-value &lt; 0.01 on all input modifications. Finally, we stress that these results were obtained without any loss in the ROC AUC metric. We have tested the algorithm both on the original test-set. Both models achieved an ROC AUC of 0.66 (CI 0.63-0.69) on both test-sets. Limitations, reasons for caution Further work needs to be done to extend the set of possible augmentations of data. Wider implications of the findings Increased reliability of AI scoring algorithms for embryo selection. It is possible to obtain consistent results over a wide range of data modifications. Trial registration number not applicable

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

P-225 Trustworthy AI algorithm for embryo ranking

Abstract

Talk to us

Similar Papers

More From: Human Reproduction

Lead the way for us

Similar Papers

Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors.
Mena Shenouda ... Isabella Flerlage
Journal of medical imaging (Bellingham, Wash.) | VOL. 10
Mena Shenouda, et. al.Mena Shenouda ... Isabella Flerlage
28 Dec 2023
Journal of medical imaging (Bellingham, Wash.) | VOL. 10

민사소송에서의 AI 알고리즘 심사
애라 한
Korea Association of the Law of Civil Procedure | VOL. 27
애라 한애라 한
28 Feb 2023
Korea Association of the Law of Civil Procedure | VOL. 27

Assessment of Therapeutic Responses Using a Deep Neural Network Based on 18F-FDG PET and Blood Inflammatory Markers in Pyogenic Vertebral Osteomyelitis.
Hyunkwang Shin ... Dongwoo Yu
Medicina (Kaunas, Lithuania) | VOL. 58
Hyunkwang Shin, et. al.Hyunkwang Shin ... Dongwoo Yu
21 Nov 2022
Medicina (Kaunas, Lithuania) | VOL. 58

Development of a Prediction Model for Colorectal Cancer among Patients with Type 2 Diabetes Mellitus Using a Deep Neural Network.
Meng-Hsuen Hsieh ... An-Kuo Chou
Journal of Clinical Medicine | VOL. 7
Meng-Hsuen Hsieh, et. al.Meng-Hsuen Hsieh ... An-Kuo Chou
12 Sep 2018
Journal of Clinical Medicine | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

P-225 Trustworthy AI algorithm for embryo ranking

Abstract

Talk to us

Similar Papers

More From: Human Reproduction