Variations in pregnancy rates between IVF programs are due to multiple factors, including patient population, stimulation protocols, and embryo quality. Standardized embryo grading systems have been developed to improve communication between embryologists and clinicians, as well as with the scientific community as a whole. However, these grading systems have not been validated within the embryologist community. We hypothesize that both intra- and inter-observer variability exists with the use of a standardized day 3 embryo grading system (Veeck grading system). Prospective, sample-randomized, controlled, blinded study. IRB approval was obtained. Thirty-five cleavage stage (Day 3) supernumerary embryos were video recorded using an inverted microscope fitted with a video camera. The 35 video clips were randomly ordered and were used to assess inter-observer variability. Included among the 35 images were 7 pairs of duplicate embryos for assessing intra-observer variability. The video clips were independently graded using the Veeck scoring system by 26 embryologists. Grading of all embryos by Dr. Veeck was used as the control. Embryologists were assessed by education level, years of experience, size of IVF program, and type of grading system used regularly. Kappa scores (k) were calculated to assess variation between and among embryologists, and the intraclass correlation coefficients were calculated to assess the magnitude of the discordance between embryologists. The inter-observer k values ranged from 0.026-0.490 (mean of 0.241), demonstrating dramatic inter-observer variability. In addition, the correlation coefficient for the inter-observer variability ranged from 0.976-0.991 (mean of 0.984). The least consistent practicing embryologist differed from Dr. Veeck by an average of 1 grade for all 35 embryos, despite using the same grading system (see Fig. 1). The intra-observer k values ranged from 0.441-1.0 (mean of 0.689) in grading and the intra-class correlation coefficient for the intra-observer variability ranged from 0.620 -1.000 (mean of 0.881). Some embryologists were consistently graded the same embryo with the equivalent grade, where as some demonstrated dramatic variability (see Fig. 2). While educational status and years of experience did not correlate with grading consistency, programs with higher cycle numbers per year had lower variability (data not shown). There is substantial inter-observer variability and moderate intra-observer variability among embryologists. Such variability could alter both the expected quality of embryos transferred, as well as the number transferred, both of which directly impacts IVF program success. We propose a day 3 embryo grading system that minimizes variability in day 3 embryo grading, thereby standardizing both embryologist-physician communication and IVF program-IVF community communication.
Read full abstract