P-264 ICM segmentation is impacted by several factors for humans as well for AI models but AI models show consistency

C Jacques,L Firminger,J Chambost,C He,T Ferrand,N Karpaviciute,C Hickman

doi:10.1093/humrep/deac107.253

Abstract

Abstract Study question Challenges associated with segmenting Inner Cell Mass (ICM): which factors affect efficacy and/or cause variation for humans and AI? Summary answer Efficacy of humans and AI is similarly impacted by four factors (ICM position in embryo, ICM cell numbers, compaction, focus) but AI is more consistent. What is known already Traditional embryo evaluation methods are based on visual quality assessment by qualified embryologists. The morphological analysis of a blastocyst on a static image is subjective and time consuming and depends strongly on embryologist knowledge and experience. Although grading systems, like Gardner system, aim to standardize quality assessment and AI models improve objectivity by agreeing with consensus, there are still disparities. Blastocyst quality assessment is generally based on three morphological components: ICM, Trophectoderm and expansion. Difficulties to properly detect one or the other can lead to different evaluations. In this study, we focus on the detection of ICM. Study design, size, duration The comparative observational study was conducted on 201 images retrospectively extracted from embryo timelapse at blastocyst stage from a French clinic collected in 2018 and 2019. Three qualified embryologists (E1, E2, E3) annotated the dataset, by drawing the ICM area on the image with the open-source CVAT annotation tool (https://openvinotoolkit.github.io/cvat/) to generate the ICM masks. E1 added to the B1 annotations information concerning ICM positioning relative to embryo, focus, compaction, ICM cell number. Participants/materials, setting, methods The dataset was split into two batches : B1 containing 60 embryos annotated twice by the embryologists (B11, B12) and B2 containing 141 embryos annotated once. A Deeplab model with Resnet101 backbone model was trained to detect the ICM on 102 images B2 annotated by E1 augmented up to 3672 images by rotating each image by 10 degrees to cover 360 degrees and optimized on 29 left images. It generated prediction masks on unseen B1. Main results and the role of chance Inter-operator variation is high in ICM detection on batch 1, embryologists don’t agree with each other according average Intersection over Union (IoU): E1vsE2: 0,52+/-0,3; E2vsE3: 0,46+/-0,4; E1vsE3: 0,51+/-0,3. In the same way, AI raises similar IoU results: AIvsE1: 0,54+/-0,3; AIvsE2: 0,47+/-0,3; AIvsE3: 0,54+/-0,3. Variation between AI and humans (AIvsH) is impacted by the studied factors in the same way than variation between humans (HvsH): Position of the ICM relative to the embryo (IoU middle/side: HvsH: 0,41/0,65, AIvsH: 0,46/0,59); ICM Cell Number (Lots/In between/Few: HvsH: 0,70/0,49/0,48 AIvsH: 0,67/0,49/0,48); Compaction (Compacted/Mostly compacted/Mostly dispersed: HvsH: 0,69/0,58/0,46, AIvsH: 0,70/0,56/0,45); Focus (In Focus/Partially/Not in focus: HvsH: 0,68/0,58/0,39, AIvsH: 0,64/0,58/0,41). Even though AI has equal difficulties as humans to detect ICM at least the AI models are consistent. Intra-operator variations on B1 are: E1: 0,70; E2: 0,69; E3: 0,86; AI:1. Limitations, reasons for caution On the human side, the annotators were trained at different clinics before annotating the images. On the AI side, the results could be improved by using a larger volume of images. Both sides are impacted by the lack of context: one static image from one focal. Wider implications of the findings These four factors should be taken into account when training embryologists or models to detect ICM in a blastocyst image. Using videos or several focals may reduce variations but then human analysis becomes time-consuming and sensitive to other factors (fatigue, repetitiveness), then using AI can become essential and more consistent. Trial registration number not applicable

Full Text