Credal Clustering for Imbalanced Data

Zuowei Zhang,Kuang Zhou,Arnaud Martin,Yiru Zhang,Zhunga Liu

doi:10.1007/978-3-030-88601-1_2

Abstract

Traditional evidential clustering tends to build clusters where the number of data for each cluster fairly close to each other. However, it may not be suitable for imbalanced data. This paper proposes a new method, called credal clustering (CClu), to deal with imbalanced data based on the theory of belief functions. Consider a dataset with \(\mathcal {C}\) wanted classes, the credal c-means (CCM) clustering method is employed at first to divide the dataset into some (i.e., \(\mathcal {S}~(\mathcal {S}>\mathcal {C})\)) clusters. Then these clusters are gradually merged following a given principle based on the density of meta-clusters and the associated singleton clusters. The merging is finished when \(\mathcal {C}\) singleton wanted classes are obtained. During this merging procedure, the objects in each singleton cluster will be assigned to one new singleton class. Moreover, a weighted mean vector rule is developed to classify the objects in the unmerged meta-cluster to the associated new classes using the K-Nearest neighbor technique. Two experiments show that CClu can handle imbalanced datasets with high accuracy, and the errors are reduced by properly modeling imprecision.

Full Text