Training Data Augmentation with Data Distilled by Principal Component Analysis

Nikolay Metodiev Sirakov,Arie Nakhmani,Tahsin Shahnewaz

doi:10.3390/electronics13020282

Nikolay Metodiev Sirakov, Arie Nakhmani + Show 1 more

Open Access

https://doi.org/10.3390/electronics13020282

Copy DOI

Abstract

This work develops a new method for vector data augmentation. The proposed method applies principal component analysis (PCA), determines the eigenvectors of a set of training vectors for a machine learning (ML) method and uses them to generate the distilled vectors. The training and PCA-distilled vectors have the same dimension. The user chooses the number of vectors to be distilled and augmented to the set of training vectors. A statistical approach determines the lowest number of vectors to be distilled such that when augmented to the original vectors, the extended set trains an ML classifier to achieve a required accuracy. Hence, the novelty of this study is the distillation of vectors with the PCA method and their use to augment the original set of vectors. The advantage that comes from the novelty is that it increases the statistics of ML classifiers. To validate the advantage, we conducted experiments with four public databases and applied four classifiers: a neural network, logistic regression and support vector machine with linear and polynomial kernels. For the purpose of augmentation, we conducted several distillations, including nested distillation (double distillation). The latter notion means that new vectors were distilled from already distilled vectors. We trained the classifiers with three sets of vectors: the original vectors, original vectors augmented with vectors distilled by PCA and original vectors augmented with distilled PCA vectors and double distilled by PCA vectors. The experimental results are presented in the paper, and they confirm the advantage of the PCA-distilled vectors increasing the classification statistics of ML methods if the distilled vectors augment the original training vectors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Training Data Augmentation with Data Distilled by Principal Component Analysis

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Jan 8, 2024
License type: CC BY 4.0

Similar Papers

Multi-class recognition of objects technical condition by classifier based on probabilistic neural network
Nadiia Bouraou ... Diana Pivtorak
Eastern-European Journal of Enterprise Technologies | VOL. 5
Nadiia Bouraou, et. al.Nadiia Bouraou ... Diana Pivtorak
30 Oct 2017
Eastern-European Journal of Enterprise Technologies | VOL. 5

Deposit type discrimination based on trace elements in sphalerite
Yu-Miao Meng ... Songning Meng
Ore Geology Reviews | VOL. 165
Yu-Miao Meng, et. al.Yu-Miao Meng ... Songning Meng
13 Jan 2024
Ore Geology Reviews | VOL. 165

Clustering feature vectors with mixed numerical and categorical attributes
Roelof K Brouwer
International Journal of Computational Intelligence Systems | VOL. 1
Roelof K BrouwerRoelof K Brouwer
01 Dec 2008
International Journal of Computational Intelligence Systems | VOL. 1

A review of synthetic and augmented training data for machine learning in ultrasonic non-destructive evaluation
Sebastian Uhlig ... Matthias Wolff
Ultrasonics | VOL. 134
Sebastian Uhlig, et. al.Sebastian Uhlig ... Matthias Wolff
18 May 2023
Ultrasonics | VOL. 134

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Training Data Augmentation with Data Distilled by Principal Component Analysis

Abstract

Talk to us

Similar Papers

More From: Electronics