Unbalanced regression sample generation algorithm based on confrontation

Huixin Tian,Chunzhi Tian,Kun Li,Weinan Jia

doi:10.1016/j.ins.2023.119157

Abstract

Data imbalance is an issue because the number of samples in different categories or target value ranges varies significantly. Numerous studies have been developed to address the data imbalance problem in classification samples. However, the issue of data imbalance in regression samples has not been researched well. The distribution of the target value of regression samples with the unbalanced data problem is more complicated than classification samples with the unbalanced data problem due to the continuity of the target values of regression samples. To solve this problem, we defined three basic modes of the data imbalance problem of regression samples: PSIR-mode (Positive Skewed Imbalanced Regression-mode), UNIR-mode (Un-Normal Imbalanced Regression-mode) and NSIR-mode (Negative Skewed Imbalanced Regression-mode). Any regression samples having data imbalance problems with complex target value distributions can be split into these three modes. To solve the data imbalance problem in regression samples, we proposed the DIRVAE (Deep Imbalanced Regression Variational Autoencoder) algorithm to generate missing and minority samples. The model can learn the distribution information of the original sample and the sample information between adjacent samples in the target value distribution. Experiments in biology, medicine and aerospace have proved the superiority of the model.

Full Text