Improved CBSO: A distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data

Feifan Dai,Xinli Wang,Weiyun Si,Guisong Yang,Yan Song,Jianhua Hu

doi:10.1016/j.ins.2021.04.017

Abstract

Imbalanced data problem is a big challenge for judicial data analysis since it often leads to a low accuracy of the data classification. Synthesizing new samples by means of oversampling is a useful method to handle this problem. However, most oversampling algorithms have been obtained regardless of noise samples and the data distribution has not been fully taken into consideration. For this purpose, an improved cluster-based synthetic oversampling algorithm, namely distributed fuzzy-based adaptive synthetic oversampling (DFBASO) algorithm, is proposed by simultaneously considering the distribution of inter-class, the distribution of intra-cluster and the characteristic of noise samples. The proposed DFBASO algorithm is equipped with: 1) fuzzy c-means (FCM) clustering algorithm application for samples of minority and majority classes; 2) weighted distribution based on two factors including the inter-class distance and the cluster capacity; and 3) a mixed synthetic method under different distribution cases of intra-cluster. Finally, the judicial data set and eight public data sets are utilized to show the effectiveness and universal applicability of the proposed DFBASO algorithm for the imbalanced data classification.

Full Text