Abstract
Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering to improve clustering accuracy. Given a dataset without any label information, it is first clustered by using the I-nice method into a set of initial clusters. From each initial cluster, a dense group of objects is obtained by removing the faraway objects. Then, the most informative object and the informative objects are identified with the local density estimation method in each dense group of objects. The identified objects are used to form a set of pairwise constraints, which are incorporated in the semi-supervised clustering algorithm to guide the clustering process toward a better solution. The advantage of this method is that no label information of data is required for selection pairwise constraints. Experimental results demonstrate that the new method improved the clustering accuracy and outperformed four state-of-the-art pairwise constraint selection methods, namely, random, FFQS, min–max, and NPU, on both synthetic and real-world datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.