Abstract

Clustering ensemble selection has shown high efficiency in the improvement of the quality of clustering solutions. This technique comprises two important metrics: diversity and quality. It has been empirically proved that ensembles of higher effectiveness can be achieved through taking into consideration the diversity and quality simultaneously. However, the relationships between these two metrics in base clusterings have remained uncertain. This paper suggests a new hierarchical selection algorithm using a diversity/quality measure based on the Jaccard similarity measure. In the proposed algorithm, the selection of the subsets of the clustering partitions is done based on their diversity measures. The proposed diversity measure (in two types of pair-wise diversity and hybrid diversity) is applied to the proposed algorithm. Hypergraph-partitioning algorithm (HGPA), cluster-based similarity partition algorithm (CSPA), and meta-clustering algorithm (MCLA) were used to obtain the consensus solution and cluster ensemble selection results with a hierarchical method. The experimental results on 14 datasets showed that selecting a subset of base clusterings using the proposed algorithm led to more accurate results compared to those of the full ensemble. The effectiveness and robustness of the proposed algorithm were demonstrated in comparison with the full ensemble. The comparative results showed that the proposed method by new diversity measure outperformed the full ensemble.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.