Impact on Inference Model Performance for ML Tasks Using Real-Life Training Data and Synthetic Training Data from GANs

Ulrike Faltings,Swen Barth,Tobias Bettinger,Michael Schäfer

doi:10.3390/info13010009

Abstract

Collecting and labeling of good balanced training data are usually very difficult and challenging under real conditions. In addition to classic modeling methods, Generative Adversarial Networks (GANs) offer a powerful possibility to generate synthetic training data. In this paper, we evaluate the hybrid usage of real-life and generated synthetic training data in different fractions and the effect on model performance. We found that a usage of up to 75% synthetic training data can compensate for both time-consuming and costly manual annotation while the model performance in our Deep Learning (DL) use case stays in the same range compared to a 100% share in hand-annotated real images. Using synthetic training data specifically tailored to induce a balanced dataset, special care can be taken concerning events that happen only on rare occasions and a prompt industrial application of ML models can be executed without too much delay, making these feasible and economically attractive for a wide scope of industrial applications in process and manufacturing industries. Hence, the main outcome of this paper is that our methodology can help to leverage the implementation of many different industrial Machine Learning and Computer Vision applications by making them economically maintainable. It can be concluded that a multitude of industrial ML use cases that require large and balanced training data containing all information that is relevant for the target model can be solved in the future following the findings that are presented in this study.

Highlights

As training performance evaluation is primarily associated with the assertion of convergence in the training process, we restrict ourselves to providing mean average precision and loss curves
To evaluate the performance of the models on the test sets, we regard the following five metrics, which we believe to capture the relevant dimensions of model performance in a productive setting while remaining broad enough to allow for a transfer of results to other (CV) inference tasks: TP — the fraction of correctly identified digits; MC — the fraction of misclassified digits; FN — the fraction of missed digits; FP
Our results show that even with only a little real-life training data available, as will be typically the case in many industrial applications, Deep Learning (DL)

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Computer Vision techniques have seen significant advances in recent years and are increasingly seeing applications in industrial contexts. Deep Learning-based approaches have been responsible for many breakthrough results in past years [1,2]. A drawback of these techniques is their demand for very large training data sets [3,4], which can be hard or even impossible to obtain when limited to real-life training data

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Dec 28, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Impact on Inference Model Performance for ML Tasks Using Real-Life Training Data and Synthetic Training Data from GANs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Image-to-image translation for improvement of synthetic thermal infrared training data using generative adversarial networks
Hanna Hamrell ... Jörgen Karlholm
-
Hanna Hamrell, et. al.Hanna Hamrell ... Jörgen Karlholm
12 Sep 2021
12 Sep 2021

Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.
Ishfaq Hussain Rather ... Sushil Kumar
Multimedia Tools and Applications | VOL. 83
Ishfaq Hussain Rather, et. al.Ishfaq Hussain Rather ... Sushil Kumar
20 May 2023
Multimedia Tools and Applications | VOL. 83

Synthetic Data for Video Surveillance Applications of Computer Vision: A Review
Rita Delussu ... Giorgio Fumera
International Journal of Computer Vision | VOL. 132
Rita Delussu, et. al.Rita Delussu ... Giorgio Fumera
17 May 2024
International Journal of Computer Vision | VOL. 132

A review of synthetic and augmented training data for machine learning in ultrasonic non-destructive evaluation
Sebastian Uhlig ... Matthias Wolff
Ultrasonics | VOL. 134
Sebastian Uhlig, et. al.Sebastian Uhlig ... Matthias Wolff
18 May 2023
Ultrasonics | VOL. 134

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impact on Inference Model Performance for ML Tasks Using Real-Life Training Data and Synthetic Training Data from GANs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information