A Set of Comprehensive Evaluation System for Different Data Augmentation Methods

Can Zhang,Xu Zhang,Dawei Tu,Hasan Ali Khattak

doi:10.1155/2022/8572852

Abstract

Data augmentation is an effective method to prevent model overfitting in deep learning, especially in medical image classification where data samples are small and difficult to obtain. In recent years, different data augmentation methods, such as those based on single data transformation, multiple data mixing, and learning data distribution, have been proposed one after another, but there has never been a systematic system to evaluate various data augmentation methods. An impartial and comprehensive data augmentation evaluation system not only can assess the benefits and drawbacks of existing augmentation approaches in a specific medical image classification but also can provide an effective research direction for the subsequent proposal of new medical image data augmentation methods, thereby advancing the development of auxiliary diagnosis technology based on medical images. Therefore, this paper proposes an objective and universal evaluation system for different data augmentation methods. In this method, different augmented methods are evaluated objectively and comprehensively in terms of classification accuracy and data diversity by using existing large public data sets. The method is universal and easy to operate. To imitate the prevalent small-sized data sets in deep learning, an equal-interval sampling technique based on similarity ranking is presented to select samples from large public data sets and construct a subset that can fully reflect the original set. The augmented data sets are then created using various data augmentation approaches based on the small-sized data sets. Finally, different data augmentation strategies are objectively and fully evaluated based on the comprehensive scores of classification accuracy and data diversity following data augmentation. The validity and feasibility of the suggested sampling method and assessment system in this study are demonstrated by experimental findings on numerous data sets.

Highlights

Data augmentation is an effective method to prevent model overfitting in deep learning, especially in medical image classification where data samples are small and difficult to obtain
In order to objectively and comprehensively evaluate the performance of different augmentation methods, this paper considers the data set after augmentation from three dimensions
In addition to the two medical imaging data sets, we selected two additional data sets from other fields to verify the broad applicability of our equal-interval sampling algorithm. ese data sets are: medical CT image data set (DeepLesion) [19], Pneumonia X-ray image data set [20], ImageNet [21], and CIFAR10 [22], and use six different sampling methods to sample the above two data sets, so as to generate small-sized data sets. e classification accuracy difference between the generated small-sized data set and the original data set, as well as the Frechet Inception Distance (FID) between them, were used as evaluation indexes to evaluate the performance of different sampling methods

Summary

Equal-Interval Sampling Algorithm Based on Similarity Ranking

To prepare data for the suggested data augmentation method assessment system, a small-sized data set should be taken from the original large data set, which adheres and corresponds to the original data set’s distribution and can completely represent it. According to the previously selected sampling number, equal-interval sampling is performed for this category set, and the Frechet Inception Distance (FID) index [14] between the sampled data set and the original complete data set is applied as a secondary verification. To construct a small-sized data set with similar distribution to the original large-scale data set, which is prepared for subsequent evaluation of various data augmentation methods. E Frechet Inception Distance, or FID, is a metric for assessing the quality of the produced images that were designed to calculate the performance of generative adversarial networks To construct a small-sized data set with similar distribution to the original large-scale data set, which is prepared for subsequent evaluation of various data augmentation methods. e Frechet Inception Distance, or FID, is a metric for assessing the quality of the produced images that were designed to calculate the performance of generative adversarial networks

Set the Sampling Quantity

Determine the Sample to Be Taken

Calculate the Euclidean Distance of the Two Pictures

Assess the Representativeness of the Small-Sized Data Set

Evaluation Index of Different Data Augmentation Method

Classification Accuracy on the Original Data Set Accor

Classification Accuracy on New Data Set after

Diversity of Data in the Augmented Data Set IS

Comparison of Different Sampling Methods

Baselines

Evaluation Metrics (1)

Evaluation Index of Different Data Augmentation Methods

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mobile Information Systems	Publication Date: Mar 11, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Set of Comprehensive Evaluation System for Different Data Augmentation Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mobile Information Systems

Lead the way for us

Similar Papers

Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
Guanlin Li ... Guoping Huang
-
Guanlin Li, et. al.Guanlin Li ... Guoping Huang
01 Jan 2019
01 Jan 2019

Data augmentation techniques in natural language processing
Lucas Francisco Amaral Orosco Pellicer ... Anna Helena Reali Costa
Applied Soft Computing Journal | VOL. 132
Lucas Francisco Amaral Orosco Pellicer, et. al.Lucas Francisco Amaral Orosco Pellicer ... Anna Helena Reali Costa
19 Nov 2022
Applied Soft Computing Journal | VOL. 132

EEG data augmentation: towards class imbalance problem in sleep staging tasks
Jiahao Fan ... Chenglu Sun
Journal of Neural Engineering | VOL. 17
Jiahao Fan, et. al.Jiahao Fan ... Chenglu Sun
01 Oct 2020
Journal of Neural Engineering | VOL. 17

Data Augmentation for Building Footprint Segmentation in SAR Images: An Empirical Study
Sandhi Wangiyana ... Piotr Samczyński
Remote sensing | VOL. 14
Sandhi Wangiyana, et. al.Sandhi Wangiyana ... Piotr Samczyński
22 Apr 2022
Remote sensing | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Set of Comprehensive Evaluation System for Different Data Augmentation Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mobile Information Systems