Research on imbalanced data set preprocessing based on deep learning

Wang Fangyu,Chen Bo,Zhang Jianhui,Bu Youjun

doi:10.1109/acctcs52002.2021.00023

Abstract

Imbalance of data sets is a widespread problem, and unbalanced data has a great impact on classification results. The traditional data preprocessing methods based on the imbalance of data sets mainly include under sampling and over sampling. Oversampling data preprocessing has the problems of over fitting and fuzzy boundary, under sampling data preprocessing method will discard the useful information of samples. In this paper, a deep learning oversampling model is proposed to solve the problems of the above methods. The model uses the data generation algorithm, the variational auto variable code algorithm, to learn the features of a few samples in the unbalanced data set, and finally combines the newly generated samples and the original data sets to form a new data set. Experimental results show that the accuracy of newly generated data is higher than that of oversampling or under sampling methods. The experimental results show that the variational self-encoding algorithm of the generative model algorithm has better preprocessing results for imbalanced data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Research on imbalanced data set preprocessing based on deep learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Unbalanced Voltage Sag Dataset Enhancement Based on Improved Balancing Generative Adversarial Network
Heju Xiao ... Linhai Qi
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 18
Heju Xiao, et. al.Heju Xiao ... Linhai Qi
16 Feb 2023
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 18

AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
Anjali More ... Dipti Rana
International Journal of Pervasive Computing and Communications | VOL. 20
Anjali More, et. al.Anjali More ... Dipti Rana
19 Aug 2022
International Journal of Pervasive Computing and Communications | VOL. 20

A comparative study on noise filtering of imbalanced data sets
Szilvia Szeghalmy ... Attila Fazekas
Knowledge-Based Systems | VOL. 301
Szilvia Szeghalmy, et. al.Szilvia Szeghalmy ... Attila Fazekas
01 Jul 2024
Knowledge-Based Systems | VOL. 301

A Novel Multi-class Classification Architecture Combining Population-based Sampling and Multi-expert Classifier for Imbalanced Data
Haochen Jiang ... Jun Chen
-
Haochen Jiang, et. al.Haochen Jiang ... Jun Chen
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on imbalanced data set preprocessing based on deep learning

Abstract

Talk to us

Similar Papers