Abstract Study question Can generative artificial intelligence (AI) produce high-fidelity human embryo images to improve AI-based embryo selection models? Summary answer Generative AI models exhibit the capability to generate high-fidelity human embryo images and can substantially improve the performance of AI models through extensive training data. What is known already The integration of AI into in vitro fertilization (IVF) procedures holds the potential to enhance objectivity and automate embryo selection for transfer. However, the effectiveness of AI is limited by data scarcity and ethical concerns related to patient data privacy. Generative Adversarial Networks (GAN) have emerged as a promising approach to alleviate data limitations by generating synthetic data that closely approximate real images. However, the advantage of applying GAN in IVF embryo images is yet to be determined. Study design, size, duration Embryo images were retrieved as training data from time-lapse microscopy (TLM) videos (n = 328). Embryo key morphokinetic variables including blastomere division from 1-cell stage to 9 (and more) cells-stage, compaction of morula, and blastocyst formation were manually annotated by six embryologists. A style-based GAN was fine-tuned as the generative model and a residual network as the morphokinetic classification model. Participants/materials, setting, methods We configured generative models with data augmentation (AUG) and pretrained weights (Pretrained-T/R) to test their refinement to synthetic image quality. We analyzed quantitative metrics including Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) to assess the quality and fidelity of the generated images. Subsequently, we evaluated qualitative performance through a visual Turing test. Then, we compared the performance of a classifier by training real images alone and by integrating real images with generated data. Main results and the role of chance During the training process, we observed consistent improvement in image quality that was measured by FID and KID scores. The AUG+Pretrained-R model showed the highest performance of the evaluated five configurations with FID and KID scores of 15.2 and 0.004 following 5,000 training iterations. Subsequently, we carried out a visual Turing test, such that a total of 60 individuals including IVF embryologists (Group I, n = 25), IVF laboratory technicians (Group II, n = 15), and non-experts (Group III, n = 20) evaluated the synthetic blastocyst-stage embryo images. Group I displayed accuracy of 55.7% (±6.2), sensitivity of 65.9% (±14.1), and specificity of 45.1% (±17.8). Group II showed accuracy of 54.2% (±4.7), sensitivity of 65.9% (±17.6), and specificity of 42.0% (±19.6). Group III had the accuracy of 50.1% (±3.8), sensitivity of 49.3% (±6.3), and specificity was 50.9% (±10.4). Statistical analysis indicated significant differences in accuracy (Kruskal-Wallis test, P=1.2x10-3) and sensitivity (Kruskal-Wallis test, P=1.6x10-4) among the three groups. In the embryonic stage classification model, integrated with generated data outperformed the model using real data solely, with an increase of validation accuracy from 72.6% to 90.6%. The classifier was tested on external TLM video data (n = 100) where the area under the curve was 0.92. Limitations, reasons for caution Further research is needed to involve a more extensive and diverse dataset, e.g. video generation, and more detailed annotations including polar body appearance, pronuclei appearance and disappearance, and blastocyst expansion and hatching. Wider implications of the findings Generative AI offers potential in generating high-fidelity human embryo images. This approach ensures sufficient training datasets while addressing class imbalances for robust AI-based methods, ultimately transforming embryo selection and enhancing IVF outcomes. A set of 5,000 generated images (1024x1024-pixel) at each embryonic developmental stage was open-source for future AI studies. Trial registration number not applicable
Read full abstract