Cross-Concatenation: Tackling Uncertainty in Imbalanced Big Data Classification

Hadi Mansourifar,Weidong Shi

doi:10.1109/bigdata52589.2021.9671763

Abstract

In this paper, we use data projection to address uncertainty problem in imbalanced data classification. Instead of resampling the data which causes uncertainty, we project the minority and majority instances into new space using a novel technique called Cross-Concatenation. To project the minority instances into a novel space, we concatenate each minority instance with all majority instances to form M * N new double size data where, M and N are the size of minority and majority classes, respectively. The same procedure is repeated to project majority instances into new space by concatenating each majority instance with all minority instances to form N * M new double size data. Our experiments show that, Cross-Concatenation can provide the classifiers sufficient data to train more efficient models since the projected classes are no longer skewed. After training the model, each test data is concatenated with the centroid of minority and majority classes to create two different instances. Afterwards, the highest probability returned from the trained model is used as a metric to assign the label to it. Our experimental results show that, the proposed method can significantly d ecrease the uncertainty in imbalanced classification with competitive results comparing to the SMOTE and its variants which are the most popular over-sampling techniques in terms of precision, recall, F1 and Area Under Curve (AUC).

Full Text