O-005 Reducing inter-observer and intra-observer variability of embryo quality assessment using deep learning

E Saïs,V Puy,N Frydman,M Filali,A Mayeur,M Poulain,J Vandame,C Fossard,O Binois,L Hesters

doi:10.1093/humrep/deac104.005

E Saïs, V Puy + Show 8 more

Open Access

PDF Available

https://doi.org/10.1093/humrep/deac104.005

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract Study question Does deep learning for embryo quality assessment reduce inter-observer variability? Summary answer An AUC of 87.65% was obtained for predicting blastocyst quality with a deep-learning algorithm trained by five embryologists who had good agreement between themselves What is known already Time-Lapse (TL) allows continuous observation of embryo development in a controlled and stable environment. Recently the use of deep learning, in particular convolutional neural networks have been introduced to enhance blastocyst image classification using the growing TL image and video data. Study design, size, duration A total of 409 embryos (5 images per embryo for a total of 2 045 images) were included in this retrospective study between 2016 and 2020. Participants/materials, setting, methods A machine-learning algorithm (Retinanet) was trained to recognize 2 045 blastocyst images from 409 embryos on 2560x1928 images and output 500x500 images with the blastocyst centered on the image. Five embryologists classified the blastocysts using Gardner’s grading system. Each image was associated with one final grade using a majority voting system. The dataset was split into a training and validation set (1 640 images plus data augmentation) and a testing set (405 images). Main results and the role of chance Fair agreement was found between the 5 embryologists when grading the embryo using Gardner’s grading system, with a maximum weighted kappa score of 39.60% reached. As for the intra-observer variability, we show that for the same embryologist grading the same embryo after a 3 month “wash out” period, in 12% of the cases the embryologist changes the grade and the fate of the embryo, meaning that an embryo that was transferred/frozen during the first annotation period was discarded during the second one, or an embryo that was discarded during the first annotation period was transferred/frozen during the second one. An Area Under the Curve (AUC) of 87.65% was obtained when testing the quality of 81 embryos (405 images) after training our algorithm on 54 038 images. For external validation we tested the algorithm with annotations of the test set from embryologists coming from another fertility center. An AUC of 82.72% was obtained. Limitations, reasons for caution The scarce number of images available in our training set compared with data sets from other more consequent clinics, and the fact that the algorithm was trained by embryologists does not suppress variability entirely. The GoogLeNet algorithm was not fined tune and was used as is. Wider implications of the findings AI is showing precious value the field of embryology, from enhancing blastocyst quality prediction to removing inter-observer subjectivity. A possible evolution to our framework would be to predict the Gardner’s grading system for each morphological parameter. Trial registration number not applicable

Full Text