Diversity and quality are two important factors that affect clustering ensemble performance. Some base clusterings are irrelevant and redundant, which decreases the performance of the clustering ensemble. By removing irrelevant and redundant clusterings, diversity and quality are increased simultaneously, which often leads to a more accurate ensemble solution. The relationship between quality and diversity as an optimization problem is a challenge. Based on the minimum redundancy-maximum relevance (mRMR) criterion, pair-wise and non-pair-wise methods are proposed. In the pair-wise method, each clustering is weighted in contrast to other base clusterings, whereas in the non-pair-wise method, virtual labeling is obtained using a consensus function, and then based on this labeling, each clustering is weighted. To evaluate the performance of these methods, several experiments were conducted on 10 real datasets, and the obtained results were compared to those of full ensembles. The results showed that the proposed methods led to a more significant performance improvement compared with full ensembles and other clustering ensemble selection methods.
Read full abstract