Abstract

Imbalance of data sets is a widespread problem, and unbalanced data has a great impact on classification results. The traditional data preprocessing methods based on the imbalance of data sets mainly include under sampling and over sampling. Oversampling data preprocessing has the problems of over fitting and fuzzy boundary, under sampling data preprocessing method will discard the useful information of samples. In this paper, a deep learning oversampling model is proposed to solve the problems of the above methods. The model uses the data generation algorithm, the variational auto variable code algorithm, to learn the features of a few samples in the unbalanced data set, and finally combines the newly generated samples and the original data sets to form a new data set. Experimental results show that the accuracy of newly generated data is higher than that of oversampling or under sampling methods. The experimental results show that the variational self-encoding algorithm of the generative model algorithm has better preprocessing results for imbalanced data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call