Classification of industrial components by artificial intelligence using synthetic data from a computer-aided design software
The growing interest in automation in industrial production increases the demand for artificial intelligence in the recognition of manufactured components. Sorting tasks require automated classification methods that can be implemented using machine vision and deep learning. One of the main challenges in deep learning applications is the collection of training data, especially when many different objects must be considered. Synthetic training images offer a way to avoid the time consuming acquisition of real images. In the presented method, synthetic images of industrial components are generated automatically within a computer aided design environment. A script creates images of the objects from multiple perspectives directly in the software in which they were designed. This lightweight, CAD integrated workflow explores how artificial data can be produced during the design process. A convolutional neural network is trained using transfer learning with synthetic data only and evaluated on real images to assess the sim to real gap. The method achieved an accuracy of 79.67% on the real world test set. The results show a clear internal improvement in accuracy when increasing the number of synthetic images, even with minor variations. The findings demonstrate the feasibility of the approach while indicating that the sim to real gap remains a challenge. A model trained with this workflow could support automated classification and sorting of industrial components.
- Research Article
- 10.54103/2282-0930/29361
- Sep 8, 2025
- Epidemiology, Biostatistics, and Public Health
Background The validation of synthetic dermatological images generated by Generative Adversarial Networks (GANs) [1] is crucial for their integration into clinical and research workflows. Despite rapid progress in image synthesis, a standardized framework for evaluating the realism and diagnostic utility of synthetic skin lesions through expert review is still lacking [2]. Existing automated evaluation metrics, while informative, do not always align with human perception and diagnostic expectations. Particularly in medical domains, subtle visual cues and contextual interpretation often elude algorithmic assessment [3]. Human evaluations remain the most direct means of determining whether synthetic images capture the nuanced features necessary for clinical utility. Without structured expert-based validation, synthetic images may introduce bias or mislead models and clinicians, hampering their responsible deployment in diagnostic support systems, training datasets, or educational tools. Objectives This study aims to conduct an expert-based qualitative evaluation of synthetic melanoma images. Specifically, it investigates the subjective perception of image realism, diagnostic quality, and the recognizability of key dermoscopic features. By engaging dermatologists in a blinded assessment of synthetic and real images, we seek to establish a foundation for systematically validating synthetic dermatological data for use in AI development, medical education, and clinical decision support. This work emphasizes the importance of subjective expert validation as a complement to technical performance metrics in assessing the fidelity of GAN-generated skin lesion images. Materials and Methods StyleGAN3-T [4] was trained on a dataset of dermoscopic images of melanoma [5–7] with adaptive discriminator augmentation and transfer learning. A total of 25 synthetic melanoma images were generated and randomly mixed with 25 real melanoma images, resulting in a 50-image dataset. Seventeen board-certified dermatologists with varying levels of experience (low <4 years, medium 5–8 years, high >8 years) participated in the evaluation. Participants were blinded to image origin and asked to classify each image as real or synthetic. They also assessed the presence of 16 defined dermoscopic patterns according to standardized definitions and rated four dimensions—image quality, skin texture, visual realism, and color realism—on a 7-point Likert scale. Additionally, participants reported their confidence in each classification decision. Statistical analyses included Chi-square tests for categorical comparisons, and Fleiss’ Kappa and Krippendorff’s Alpha were used to measure inter-rater agreement. Results Real images were consistently rated higher than synthetic images across all qualitative dimensions: image quality (high: 15.8% real vs. 11.3% synthetic), skin texture (high: 22.4% vs. 13.4%), and visual realism (high: 22.6% vs. 13.2%), all with p < 0.001. Confidence in evaluations was also significantly greater for real images, with high confidence reported in 17.4% of real cases compared to 8.7% for synthetic ones (p < 0.001).Regarding the recognition of image origin, the overall classification accuracy was 64%. Real images were correctly identified in 73% of cases, while only 56% of synthetic images were correctly classified as synthetic. Accuracy increased with expertise: from 59% in the low-experience group to 71% among high-experience dermatologists. Similarly, higher self-reported confidence was associated with improved performance (accuracy 74% at high confidence level). Recognition of specific dermoscopic features showed differences between real and synthetic images. The blue-white veil was detected in 29.1% of real images compared to 13.8% of synthetic ones (p < 0.001), and shiny white streaks in 22.6% vs. 7.9% (p < 0.001). Conversely, synthetic images were more frequently associated with irregular pigmented blotches (45.0% vs. 30.9%, p < 0.001). The multicomponent pattern, typically indicative of melanoma complexity, was identified in 40.6% of real images versus only 23.2% of synthetic ones (p < 0.001), suggesting a gap in the synthetic images’ structural fidelity (Table 1). Inter-rater agreement for the classification of real versus synthetic images was low, with a Fleiss’ kappa of 0.183. Pattern recognition agreement also remained weak (e.g., kappa < 0.3 for most features), underscoring variability in expert interpretations. Further subgroup analyses showed that images rated as highly realistic or evaluated with high confidence were more likely to be classified correctly, with accuracy rising to 74% in the highest-confidence subgroup. Conclusions Synthetic melanoma lesions generated using StyleGAN3-T demonstrate visually convincing features and were frequently perceived as real, yet consistently underperformed compared to real images in diagnostic quality and structural detail. Participants often struggled to distinguish synthetic from real lesions, particularly when realism ratings were medium to high. Critical diagnostic patterns, such as the blue-white veil and shiny white streaks, were significantly underrepresented in synthetic images. These limitations were reflected in the lower classification confidence and weaker inter-rater agreement. Despite these challenges, the study highlights the potential of synthetic data to approach realism levels sufficient for research and educational use. Qualitative validation by dermatologists is essential to benchmark the readiness of synthetic images for real-world medical applications. As generative models continue to evolve, expert evaluation should remain a key component of validation pipelines to ensure clinical and pedagogical reliability.
- Research Article
60
- 10.1016/j.cmpb.2020.105420
- Feb 29, 2020
- Computer Methods and Programs in Biomedicine
Background and objectivesAutomated segmentation and tracking of surgical instruments and catheters under X-ray fluoroscopy hold the potential for enhanced image guidance in catheter-based endovascular procedures. This article presents a novel method for real-time segmentation of catheters and guidewires in 2d X-ray images. We employ Convolutional Neural Networks (CNNs) and propose a transfer learning approach, using synthetic fluoroscopic images, to develop a lightweight version of the U-Net architecture. Our strategy, requiring a small amount of manually annotated data, streamlines the training process and results in a U-Net model, which achieves comparable performance to the state-of-the-art segmentation, with a decreased number of trainable parameters.MethodsThe proposed transfer learning approach exploits high-fidelity synthetic images generated from real fluroscopic backgrounds. We implement a two-stage process, initial end-to-end training and fine-tuning, to develop two versions of our model, using synthetic and phantom fluoroscopic images independently. A small number of manually annotated in-vivo images is employed to fine-tune the deepest 7 layers of the U-Net architecture, producing a network specialized for pixel-wise catheter/guidewire segmentation. The network takes as input a single grayscale image and outputs the segmentation result as a binary mask against the background.ResultsEvaluation is carried out with images from in-vivo fluoroscopic video sequences from six endovascular procedures, with different surgical setups. We validate the effectiveness of developing the U-Net models using synthetic data, in tests where fine-tuning and testing in-vivo takes place both by dividing data from all procedures into independent fine-tuning/testing subsets as well as by using different in-vivo sequences. Accurate catheter/guidewire segmentation (average Dice coefficient of ~ 0.55, ~ 0.26 and ~ 0.17) is obtained with both U-Net models. Compared to the state-of-the-art CNN models, the proposed U-Net achieves comparable performance ( ± 5% average Dice coefficients) in terms of segmentation accuracy, while yielding a 84% reduction of the testing time. This adds flexibility for real-time operation and makes our network adaptable to increased input resolution.ConclusionsThis work presents a new approach in the development of CNN models for pixel-wise segmentation of surgical catheters in X-ray fluoroscopy, exploiting synthetic images and transfer learning. Our methodology reduces the need for manually annotating large volumes of data for training. This represents an important advantage, given that manual pixel-wise annotations is a key bottleneck in developing CNN segmentation models. Combined with a simplified U-Net model, our work yields significant advantages compared to current state-of-the-art solutions.
- Research Article
172
- 10.1001/jamaophthalmol.2018.6156
- Jan 10, 2019
- JAMA Ophthalmology
Deep learning (DL) used for discriminative tasks in ophthalmology, such as diagnosing diabetic retinopathy or age-related macular degeneration (AMD), requires large image data sets graded by human experts to train deep convolutional neural networks (DCNNs). In contrast, generative DL techniques could synthesize large new data sets of artificial retina images with different stages of AMD. Such images could enhance existing data sets of common and rare ophthalmic diseases without concern for personally identifying information to assist medical education of students, residents, and retinal specialists, as well as for training new DL diagnostic models for which extensive data sets from large clinical trials of expertly graded images may not exist. To develop DL techniques for synthesizing high-resolution realistic fundus images serving as proxy data sets for use by retinal specialists and DL machines. Generative adversarial networks were trained on 133 821 color fundus images from 4613 study participants from the Age-Related Eye Disease Study (AREDS), generating synthetic fundus images with and without AMD. We compared retinal specialists' ability to diagnose AMD on both real and synthetic images, asking them to assess image gradability and testing their ability to discern real from synthetic images. The performance of AMD diagnostic DCNNs (referable vs not referable AMD) trained on either all-real vs all-synthetic data sets was compared. Accuracy of 2 retinal specialists (T.Y.A.L. and K.D.P.) for diagnosing and distinguishing AMD on real vs synthetic images and diagnostic performance (area under the curve) of DL algorithms trained on synthetic vs real images. The diagnostic accuracy of 2 retinal specialists on real vs synthetic images was similar. The accuracy of diagnosis as referable vs nonreferable AMD compared with certified human graders for retinal specialist 1 was 84.54% (error margin, 4.06%) on real images vs 84.12% (error margin, 4.16%) on synthetic images and for retinal specialist 2 was 89.47% (error margin, 3.45%) on real images vs 89.19% (error margin, 3.54%) on synthetic images. Retinal specialists could not distinguish real from synthetic images, with an accuracy of 59.50% (error margin, 3.93%) for retinal specialist 1 and 53.67% (error margin, 3.99%) for retinal specialist 2. The DCNNs trained on real data showed an area under the curve of 0.9706 (error margin, 0.0029), and those trained on synthetic data showed an area under the curve of 0.9235 (error margin, 0.0045). Deep learning-synthesized images appeared to be realistic to retinal specialists, and DCNNs achieved diagnostic performance on synthetic data close to that for real images, suggesting that DL generative techniques hold promise for training humans and machines.
- Research Article
53
- 10.1016/j.xops.2022.100258
- Nov 22, 2022
- Ophthalmology Science
SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease
- Research Article
2
- 10.1109/tmc.2025.3576203
- Oct 1, 2025
- IEEE Transactions on Mobile Computing
Digital twins (DT) offer a low-overhead evaluation platform and the ability to generate rich datasets for training machine learning (ML) models before actual deployment. Specifically, for the scenario of ML-aided millimeter wave (mmWave) links between moving vehicles to roadside units, we show how DT can create an accurate replica of the real world for model training and testing. The contributions of this paper are twofold: First, we propose a framework to create a multimodal Digital Twin (DT), where synthetic images and LiDAR data for the deployment location are generated along with RF propagation measurements obtained via ray-tracing. Second, to ensure effective domain adaptation, we leverage <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">meta-learning</i>, specifically <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Model-Agnostic Meta-Learning</i> (MAML), with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">transfer learning</i> (TL) serving as a baseline validation approach. The proposed framework is validated using a comprehensive dataset containing both real and synthetic LiDAR and image data for mmWave V2X beam selection. It also enables the investigation of how each sensor modality impacts domain adaptation, taking into account the unique requirements of mmWave beam selection. Experimental results show that models trained on synthetic data using transfer learning and meta-learning, followed by minimal fine-tuning with real-world data, achieve up to 4.09× and 14.04× improvements in accuracy, respectively. These findings highlight the potential of synthetic data and meta-learning to bridge the domain gap and adapt rapidly to real-world beamforming challenges.
- Research Article
2
- 10.1182/blood-2023-187521
- Nov 2, 2023
- Blood
Synthetic Histopathological Images Generation with Artificial Intelligence to Accelerate Research and Improve Clinical Outcomes in Hematology
- Research Article
- 10.1016/j.jdent.2025.106274
- Feb 1, 2026
- Journal of dentistry
Progress in the development of deep learning tools for dental imaging is constrained by limited access to real-world datasets due to privacy concerns, class imbalance, and data scarcity. This narrative review focuses on the use of synthetic data as a potential solution to these challenges. The review addresses both technical, clinical, and ethical/regulatory aspects, and was drafted by a multidisciplinary team. Each subsection was assigned to at least two contributors, with two central members overseeing the entire process. Relevant studies were identified through electronic searches in PubMed, Scopus, Embase, Google Scholar, Web of Science, and IEEE Xplore, supplemented by conference papers and book chapters. For the subsection on clinical applications, publications in the domain of dentistry and oral health focused on fully synthetic image generation were included; studies on image translation or other image processing tasks were excluded. Synthetic imaging data can be generated using generative adversarial networks, variational autoencoders, and denoising diffusion probabilistic models. Synthetic imaging can complement real-world data by mitigating class imbalance, augmenting scarce datasets, and enabling diverse, realistic representations of rare conditions and anatomical variations. It holds promise for diagnostics, education, and multimodal integration across imaging modalities. Studies on dental image synthesis remain scarce, and comprehensive evidence regarding the impact of data augmentation using synthetic images is lacking. Key challenges persist, including ensuring anatomical fidelity and minimizing artifacts. Future emphasis should be on interdisciplinary collaboration, standardized generation workflows, open-source tools, robust strategies for synthetic data integration, and clear regulatory guidance. Synthetic imaging can help overcome data scarcity and class imbalance in dental artificial intelligence (AI), leading to more robust and generalizable AI models.
- Research Article
11
- 10.1167/tvst.13.6.1
- Jun 3, 2024
- Translational vision science & technology
Deep learning architectures can automatically learn complex features and patterns associated with glaucomatous optic neuropathy (GON). However, developing robust algorithms requires a large number of data sets. We sought to train an adversarial model for generating high-quality optic disc images from a large, diverse data set and then assessed the performance of models on generated synthetic images for detecting GON. A total of 17,060 (6874 glaucomatous and 10,186 healthy) fundus images were used to train deep convolutional generative adversarial networks (DCGANs) for synthesizing disc images for both classes. We then trained two models to detect GON, one solely on these synthetic images and another on a mixed data set (synthetic and real clinical images). Both the models were externally validated on a data set not used for training. The multiple classification metrics were evaluated with 95% confidence intervals. Models' decision-making processes were assessed using gradient-weighted class activation mapping (Grad-CAM) techniques. Following receiver operating characteristic curve analysis, an optimal cup-to-disc ratio threshold for detecting GON from the training data was found to be 0.619. DCGANs generated high-quality synthetic disc images for healthy and glaucomatous eyes. When trained on a mixed data set, the model's area under the receiver operating characteristic curve attained 99.85% on internal validation and 86.45% on external validation. Grad-CAM saliency maps were primarily centered on the optic nerve head, indicating a more precise and clinically relevant attention area of the fundus image. Although our model performed well on synthetic data, training on a mixed data set demonstrated better performance and generalization. Integrating synthetic and real clinical images can optimize the performance of a deep learning model in glaucoma detection. Optimizing deep learning models for glaucoma detection through integrating DCGAN-generated synthetic and real-world clinical data can be improved and generalized in clinical practice.
- Discussion
8
- 10.1016/j.ejmp.2021.05.008
- Mar 1, 2021
- Physica Medica
Focus issue: Artificial intelligence in medical physics.
- Research Article
3
- 10.1016/j.xops.2024.100676
- May 1, 2025
- Ophthalmology science
Improving Artificial Intelligence-based Microbial Keratitis Screening Tools Constrained by Limited Data Using Synthetic Generation of Slit-Lamp Photos.
- Research Article
- 10.1080/01621459.2025.2552510
- Oct 2, 2025
- Journal of the American Statistical Association
Generative artificial intelligence (AI) has transformed the biomedical imaging field through image synthesis, addressing challenges of data availability, privacy, and diversity in biomedical research. This article proposes a novel nonparametric method within the functional data framework to discern significant differences between the mean and covariance functions of original and synthetic biomedical imaging data, thereby enhancing the fidelity and utility of synthetic data. Focusing on surface-based synthetic imaging data, our approach employs triangulated spherical splines to address spatial heterogeneity. A key contribution is the construction of simultaneous confidence regions (SCRs) to rigorously quantify uncertainty in original-synthetic differences. The asymptotic properties of the proposed SCRs are established, providing exact coverage probabilities and demonstrating equivalence to those derived from noise-free imaging data. Simulation studies validate the coverage properties of the SCRs and evaluate the size and power of the associated hypothesis tests. The proposed method is applied to compare the original and synthetic brain imaging data from the Human Connectome Project, where it highlights significant differences between original and synthetic images. We demonstrate that a straightforward transformation can align the mean and covariance functions of synthetic images with those of the original data, improving their reliability and utility for biomedical research applications. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
- Discussion
9
- 10.1002/cyto.a.23957
- Dec 30, 2019
- Cytometry Part A
Deep learning methods developed by the computer vision community are successfully being adapted for use in biomedical image analysis and synthesis applications with some delay. Also in cell image synthesis, we can observe significant improvements in the quality of generated results brought about by deep learning. The typical task is to generate isolated cell images based on training image examples with cropped, centered, and aligned individual cells. While the first trials to use generative adversarial networks (GANs) without any object detection or segmentation had limited capabilities, the recent article by Scalbert et al. 1 has shown that significant improvement can be obtained by splitting the task into (1) learning and generating object (cell and/or nuclei) shapes based on image segmentation, and (2) learning and generating the texture separately for each segment type including the background using so-called style transfer. The first attempts to generate synthetic cell images date back to the late 1990s and the quality of the artificially created image data has been improving ever since 2. Initially, simple shapes like spheres, ellipsoids, curved discs, or bananas were used in combination with simple textures like homogenous color or Perlin noise. Later on, more realistic shapes and textures appeared that were hard to distinguish from real ones not only visually but also based on their mathematical characteristics such as histogram properties, entropy, or central moments. The basic problem that researchers have been trying to tackle has remained the same—they have been mostly developing methods to generate an artificial cropped image of just one static isolated cell centered in the field of view, often also aligned in some way with one of the axes. The focus has initially been on single-channel gray-scale 2D cell images while, later on, extensions to more dimensions have been added, namely support for multiple channels, 3D or even time-lapse virtual cell imaging. Some authors have also tried to work at different scales modeling either individual cell components (organelles) in more detail or, vice versa, generating whole cell populations or tissues at low resolution 2. The development of cell image synthesis methods has been driven by two main application areas: benchmarking (i.e., testing) cell image analysis algorithms and data augmentation for training machine learning methods. For both purposes, synthetic data are easy to obtain in any quantities and are accompanied by inherent ground truth, which is in contrast to real data sets, for which acquisition is expensive and each expert produces a different annotation. Moreover, for benchmarking image restoration methods, synthetic data are the only option because no expert can create the original image from a blurred and noisy one. However, in spite of all the progress that has been made in the past two decades, not all properties of synthetic data match the real world. Consequently, benchmarking results differ between real and synthetic data, which is usually solved by using both these data types in cell image analysis benchmarks and competitions 3. In principle, there are two approaches to cell image synthesis: “handcrafted” parametric models and learning-based generative models 2. The former ones use the human mind to convert available knowledge on cell morphology into a plausible model while the latter let the computer learn how cells of certain type should look like. In both cases, we can distinguish different depth of modeling corresponding to different levels of detail (see Table 1). For parametric models (Table 1, first column), it is naturally not possible to generate the whole image without thinking about objects. Hence, at least cell or nucleus shapes are modeled using smooth surfaces obtained, for example, by random deformations of basic shapes like circles, ellipses, or smoothed polygons in 2D and spheres or ellipsoids in 3D 6, 7. The most common texture applied inside these shapes has probably been the Perlin noise 6, 7 but also, for instance, wavelet-based textures have been tried out 11. Attempts have been made to go into finer details by generating, for example, individual chromosomes within cell nucleus 13. For learning-based models (Table 1, second and third column), synthetic images are generated from real image data. The simplest way is to generate the synthetic image data from the real one using some type of geometrical transform (linear or nonlinear), which is often used in data augmentation techniques. Furthermore, it is possible to generate the whole synthetic image after training without distinguishing any regions using generative adversarial networks (GANs) 4, 5 but this approach requires a large amount of training data, works slowly even in 2D (no paper in 3D so far) and does not offer inherent ground truth, which makes it impractical for both benchmarking and data augmentation purposes. Hence, cell or nucleus (or both) regions are usually distinguished in the training as well as the generated image data and the model tries to learn the cell or nucleus shapes. To this end, each possible shape is represented in a multidimensional latent space (i.e., shape space) where each dimension represents certain important learned shape attribute that might be interpretable (e.g., elongatedness in a certain direction) but not always. Each shape corresponds to a point in this shape space and new shapes can be obtained by simply moving within this space, that is, actually interpolating existing shapes. There are plenty of methods capable of shape learning and synthesis ranging from traditional ones like component analysis or diffeomorphic modeling to deep autoencoders 8. Recently, also GANs have been used for this purpose even in 3D 10. In contrary to shape learning, much less attention has been paid to the texture learning and synthesis. Cells, nuclei, and background have often been just filled with homogenous color or with suitably interleaved patches of texture copied from the respective regions of training images 12. This gap has been filled by Scalbert et al. in their recent paper 1 where the authors have used deep learning to generate texture similar to given examples (see the next section). Finally, learning sizes or distributions of cell components have also been tried out for some object types like mitochondria 9 but this research direction is still waiting for proper exploration. The recent paper by Scalbert et al. 1 proposes to generate cell and nucleus textures using so-called style transfer, which is a term used in the computer vision community to denote generating texture for a particular image region based on example textures from some semantically segmented training images. One of the methods for style transfer based on so-called neural patches was published by Champandard 14 who demonstrated a very realistic transfer of texture from semantically segmented source image to the destination doodle image (see Fig. 1). This method was adapted by Scalbert et al. to cell image synthesis. The authors use the Fourier shape descriptors method formerly utilized by Malm et al. 12 for learning cell and nucleus shapes and then apply the Champandard method of style transfer separately for the regions of nuclei, cytoplasm, and background (see Table 1, last column, third row). The training images (subimage, Column 1) were semantically segmented into background, cytoplasm, and nuclei (subimage, Column 2). The shapes of cells and nuclei were learned and new shapes generated (subimage, Column 3). Finally, textures were learned and transferred for each region type separately (subimage, Column 4). Different cell types and imaging modes have been tried out (subimage, different rows). The cell generation based on the described approach has been tested for 2D isolated cells cropped and centered within the field of view. The plausibility of the results has been checked visually as well as using selected geometrical, color, and texture descriptors. The Python code for the cell segmentation mask generator and the doodle-style transfer method are available at gitlab.com 1. The scripts are easy to launch and they work also on any other single 2D input cell image, for which a mask with nuclear and cellular regions is prepared. The main limitation of the scripts is runtime; the example images supplied by the authors are approximately 100 × 100 pixels large and it takes several minutes on a common computer to generate one synthetic image of the same size. Another limitation is that the method works well for blurred (low optical resolution) textures, like those used by the authors, but is not applicable to textures containing high spatial frequency information like fibers—it produces blurred output with the repetition of certain artificial patterns not present in the input. In spite of these limitations, the pioneering work on using deep learning in cell texture synthesis has demonstrated the feasibility of style transfers for cells and nuclei. The obtained synthetic data look realistic, at least for low-resolution textures. Nevertheless, the usefulness of such data for benchmarking or data augmentation purposes is still to be shown. Also, the extension of the approach from 2D into 3D would be desirable but would require a substantial speed-up. If finer details need to be simulated, then the modeling of specific cell components could be used together with the texture transfer separately for these specific components and cellular space in between them. In addition to the static isolated cell synthesis, there is still a lot of room for the exploration of approaches to learning and generation of temporal patterns of cell changes as well as spatial patterns of multiple cell arrangement. Both these patterns have already been tackled using parametric models 2, 13 but remain too complex for learning-based models, which might, however, change soon. The author would like to thank David Svoboda for the helpful feedback on the first draft of this manuscript and Cem Emre Akbaş for the technical help while testing the software.
- Research Article
1
- 10.1093/humrep/deae108.553
- Jul 3, 2024
- Human Reproduction
Study question How effectively can a latent diffusion model generate high-fidelity embryo images tailored to the specific contextual needs of researchers, based on the user’s text input? Summary answer The latent diffusion model successfully generated high-resolution embryo images, providing a novel approach to address data scarcity in embryo imaging. What is known already The use of AI in generating synthetic data has been increasingly explored to address the scarcity of real-world datasets in various fields. Particularly in medical imaging, AI-generated synthetic data offers a potential solution to overcome the limitations imposed by data privacy concerns and ethical considerations. Previous studies have shown that AI can replicate complex patterns in data, but its application in generating embryo images remains less explored. Study design, size, duration Single static images of 5,133 Day 5 blastocysts and 2,093 Day 3 cleavages were retrospectively collected from seven in vitro fertilization clinics between June 2011 and May 2022. The images were analyzed along with relevant metadata including clinical information and embryo grades. Day 3 embryo grading was based on the number of blastomeres, evenness, and fragmentation percentage. Day 5 grading criteria included the inner cell mass, trophectoderm, and blastocyst stage evaluations. Participants/materials, setting, methods An AI model using latent diffusion was developed to generate 10,051 Day 5 and 10,088 Day 3 synthetic embryo images. The authenticity was assessed through visual Turing tests, where embryologists discerned real from synthetic images. For the evaluation, 200 real (100 Day 5, 100 Day 3) and 200 synthetic (100 Day 5, 100 Day 3) images were randomly chosen from each dataset, ensuring a comprehensive test of the generated images’ realism. Main results and the role of chance The AI model’s efficacy in generating synthetic embryo images was assessed through visual Turing tests on Day 5 and Day 3 embryos, yielding accuracies of 0.59 and 0.57, respectively. These accuracies fall within the 99% confidence interval of near-random performance (0.41 to 0.59), highlighting the challenge in distinguishing synthetic from real images. For Day 5 embryos, the sensitivity and specificity were recorded at 0.52 and 0.66, indicating a moderate challenge in identifying synthetic images and a relatively higher ease in recognizing real ones. Day 3 embryos presented a lower sensitivity of 0.41, suggesting greater difficulty in detecting synthetic images, while the specificity increased to 0.73, indicating a stronger ability to identify real images. Collectively, with an overall accuracy of 0.58, sensitivity of 0.47, and specificity of 0.70, these findings confirm the synthetic images’ remarkable realism, closely emulating actual embryo features to the extent that differentiation by experts proved challenging. This level of realism underscores the synthetic images’ potential to significantly enrich embryological research datasets, promising advancements in the field. Limitations, reasons for caution The study’s limitation lies in the generated synthetic embryos being based on three embryo grade categories, limiting feature diversity. Additionally, there was no quantitative assessment method for these images, relying instead on expert evaluation, which required deploying a large number of embryologists. Wider implications of the findings The model’s generation of highly realistic embryo images counters data scarcity in embryological research, potentially elevating AI’s utility, enriching educational content, advancing reproductive medicine, and ensuring ethical data usage. Trial registration number not applicable
- Research Article
61
- 10.1016/j.compag.2020.105378
- Apr 23, 2020
- Computers and Electronics in Agriculture
In this paper we report on improving part segmentation performance for robotic vision using convolutional neural networks by optimising the visual realism of synthetic agricultural images. In Part I, a cycle consistent generative adversarial network was applied to synthetic and empirical images with the objective to generate more realistic synthetic images by translating them to the empirical domain. We hypothesise that plant part image features (e.g. color, texture) become more similar to the empirical domain after translation of the synthetic images. Results confirm this with an improved mean color distribution correlation with the empirical data prior of 0.62 and post translation of 0.90. Furthermore, the mean image features of contrast, homogeneity, energy and entropy moved closer to the empirical mean, post translation. In Part II, 7 experiments were performed using convolutional neural networks with different combinations of synthetic, synthetic translated to empirical and empirical images. We hypothesise that the translated images can be used for (i) improved learning of empirical images, and (ii) that learning without any fine-tuning with empirical images is improved by bootstrapping with translated images over bootstrapping with synthetic images.Results confirm our hypotheses in Part II. First a maximum intersection-over-union performance was achieved of 0.52 when bootstrapping with translated images and fine-tuning with empirical images; an 8% increase compared to only using synthetic images. Second, training without any empirical fine-tuning resulted in an average IOU of 0.31; a 55% performance increase over previous methods that only used synthetic images. The key contribution of this paper to robotic vision is to provide supporting evidence that domain adaptation can be successfully used to translate and improve synthetic data to the real empirical domain that results in improved segmentation learning whilst lowering the dependency on manually annotated data.
- Conference Article
30
- 10.1109/mmsp.2017.8122260
- Oct 1, 2017
We present an Automatic License Plate Recognition system designed around Convolutional Neural Networks (CNNs) and trained over synthetic plate images. We first design CNNs suitable for plate and character detection, sharing a common architecture and training procedure. Then, we generate synthetic images that account for the varying illumination and pose conditions encountered with real plate images and we use exclusively such synthetic images to train our CNNs. Experiments with real vehicle images captured in natural light with commodity imaging systems show precision and recall in excess of 93% despite our networks are trained exclusively on synthetic images.