Abstract
Although semi-supervised clustering ensemble methods have achieved satisfactory performance, they fail to effectively utilize the constrained knowledge such as cannot-link and must-link when generating diverse ensemble members. In addition, they ignore negative effects brought about by redundancies and noisy data. To address the above shortcomings, in this paper we propose an approach to combine multiple semi-supervised clustering solutions via adaptively regularizing the weights of clustering ensemble members, which is referred to as ARSCE. First, we generate a series of feature subspaces by randomly selecting feature without replacement to avoid the scenario where there are two identical feature subspaces. Second, we conduct feature transformation on the above obtained feature subspaces while considering the pairwise constraints to find new clustering-friendly spaces, where clustering methods are exploited to generate various clustering solutions. Finally, we design a novel fusion strategy to integrate multiple clustering solutions into a unified clustering partition, where weights are designated for each clustering ensemble member. Extensive experiments are conducted on multiple real-world benchmarks, and experimental results demonstrate the effectiveness and superiority of our proposed method ARSCE over other counterparts.
Highlights
Clustering, as one of unsupervised learning methods, aims to split data into several disjoint groups, so that data in the same group are more similar than those from different groups
Inspired by ensemble supervised learning methods, recent years have witnessed the development of clustering ensemble, which is divided into two steps: the generations of clustering solutions and the fusion of clustering solutions
To achieve the above two goals, we propose an adaptative regularized semi-supervised clustering ensemble framework, which is referred to as adaptive Regularized semisupervised clustering ensemble method (ARSCE)
Summary
Clustering, as one of unsupervised learning methods, aims to split data into several disjoint groups, so that data in the same group are more similar than those from different groups. Bai et al [7] propose a weighted consensus measure based on information entropy to evaluate the clustering quality These clustering ensemble methods have achieved satisfactory performance, they seldom consider the issues below: 1) how to fully exploit prior information provided by experts, denoted as must-link and cannot-link constraints, and 2) how to design a better fusion strategy to integrate all the clustering solutions into a more robust and stable solution, compared with each base clustering solution component. The contributions of this work are summarized as follows: 1) We propose a transformation working in random feature subspaces while considering pairwise constraints for finding a clustering-friendly space, where clustering solutions are generated via using traditional clustering methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.