Synthetic Training Data Generation Research Articles

The 3D nano/microstructure of materials can significantly influence their macroscopic properties. In order to enable a better understanding of such structure-property relationships, 3D microscopy techniques can be deployed, which are however often expensive in both time and costs. Often 2D imaging techniques are more accessible, yet they have the disadvantage that the 3D nano/microstructure of materials cannot be directly retrieved from such measurements. The motivation of this work is to overcome the issues of characterizing 3D structures from 2D measurements for hetero-aggregate materials. For this purpose, a method is presented that relies on machine learning combined with methods of spatial stochastic modeling for characterizing the 3D nano/microstructure of materials from 2D data. More precisely, a stochastic model is utilized for the generation of synthetic training data. This kind of training data has the advantage that time-consuming experiments for the synthesis of differently structured materials followed by their 3D imaging can be avoided. More precisely, a parametric stochastic 3D model is presented, from which a wide spectrum of virtual hetero-aggregates can be generated. Additionally, the virtual structures are passed to a physics-based simulation tool in order to generate virtual scanning transmission electron microscopy (STEM) images. The preset parameters of the 3D model together with the simulated STEM images serve as a database for the training of convolutional neural networks, which can be used to determine the parameters of the underlying 3D model and, consequently, to predict 3D structures of hetero-aggregates from 2D STEM images. Furthermore, an error analysis is performed with respect to structural descriptors, e.g. the hetero-coordination number. The proposed method is applied to image data of TiO2-WO3 hetero-aggregates, which are highly relevant in photocatalysis processes. However, the proposed method can be transferred to other types of aggregates and to different 2D microscopy techniques. Consequently, the method is relevant for industrial or laboratory setups in which product quality is to be quantified by means of inexpensive 2D image acquisition.

Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy reasons, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. In this study, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional generative adversarial networks (GANs) because our model needs only a single image and the corresponding ground truth to train. We also show that the synthetic data generation pipeline can be used to produce alternative artificial segmentation datasets with corresponding ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real data and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real data and the synthetic data generated from the SinGAN-Seg pipeline, we show that the models trained on synthetic data have very close performances to those trained on real data when both datasets have a considerable amount of training data. In contrast, we show that synthetic data generated from the SinGAN-Seg pipeline improves the performance of segmentation models when training datasets do not have a considerable amount of data. All experiments were performed using an open dataset and the code is publicly available on GitHub.

Synthetic Training Data Generation Research Articles

Articles published on Synthetic Training Data Generation

Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.

Deep Generation of Synthetic Training Data for the Automated Extraction of Semantic Knowledge from Historical Maps

Using convolutional neural networks for stereological characterization of 3D hetero-aggregates based on synthetic STEM data

Review of methods and systems for generation of synthetic training data

Collagen fiber centerline tracking in fibrotic tissue via deep neural networks with variational autoencoder-based synthetic training data generation.

Deep Learning Based Vehicle Detection on Real and Synthetic Aerial Images: Training Data Composition and Statistical Influence Analysis

Polynomial Chaos-Based Procedural Generation of Synthetic Training Data in Machine Learning for Automated Acoustic Monitoring

I Spy You

Array Camera Image Fusion using Physics-Aware Transformers

Effect of Kinematics and Fluency in Adversarial Synthetic Data Generation for ASL Recognition With RF Sensors

Designing a Human-in-the-Loop System for Object Detection in Floor Plans

SinGAN-Seg: Synthetic training data generation for medical image segmentation.

Generation of synthetic training data for SEEG electrodes segmentation.

Clothing Items Classification Based on X-Ray Multi-Shot Imaging for E-Commerce

Rapid Quantitative Analysis of IR Absorption Spectra for Trace Gas Detection by Artificial Neural Networks Trained with Synthetic Data.

Procedural synthetic training data generation for AI-based defect detection in industrial surface inspection

Synthetic Training Data Generation for Visual Object Identification on Load Carriers

Simulation-Based Generation of Representative and Valid Training Data for Acoustic Resonance Testing

Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Synthetic Training Data Generation Research Articles

Articles published on Synthetic Training Data Generation

Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.

Deep Generation of Synthetic Training Data for the Automated Extraction of Semantic Knowledge from Historical Maps

Using convolutional neural networks for stereological characterization of 3D hetero-aggregates based on synthetic STEM data

Review of methods and systems for generation of synthetic training data

Collagen fiber centerline tracking in fibrotic tissue via deep neural networks with variational autoencoder-based synthetic training data generation.

Deep Learning Based Vehicle Detection on Real and Synthetic Aerial Images: Training Data Composition and Statistical Influence Analysis

Polynomial Chaos-Based Procedural Generation of Synthetic Training Data in Machine Learning for Automated Acoustic Monitoring

I Spy You

Array Camera Image Fusion using Physics-Aware Transformers

Effect of Kinematics and Fluency in Adversarial Synthetic Data Generation for ASL Recognition With RF Sensors

Designing a Human-in-the-Loop System for Object Detection in Floor Plans

SinGAN-Seg: Synthetic training data generation for medical image segmentation.

Generation of synthetic training data for SEEG electrodes segmentation.

Clothing Items Classification Based on X-Ray Multi-Shot Imaging for E-Commerce

Rapid Quantitative Analysis of IR Absorption Spectra for Trace Gas Detection by Artificial Neural Networks Trained with Synthetic Data.

Procedural synthetic training data generation for AI-based defect detection in industrial surface inspection

Synthetic Training Data Generation for Visual Object Identification on Load Carriers

Simulation-Based Generation of Representative and Valid Training Data for Acoustic Resonance Testing

Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models