Abstract

Learning a proper distance for clustering from prior knowledge falls into the realm of semisupervised fuzzy clustering. Although most existing learning methods take prior knowledge (e.g., pairwise constraints) into account, they pay little attention to local knowledge of data, which, however, can be utilized to optimize the distance. In this article, we propose a novel distance learning method, which learns from the Group-level information, for semisupervised fuzzing clustering. We first present a new format of constraint information, called Group-level constraints, by elevating the pairwise constraints (must-links and cannot-links) from point level to Group level. The Groups, generated around data points contained in the pairwise constraints, carry not only the local information of data (the relation between close data points) but also more background information under some given limited prior knowledge. Then, we propose a novel method to learn a distance by using the Group-level constraints, namely, Group-based distance learning, in order to optimize the performance of fuzzy clustering. The distance learning process aims to pull must-link Groups as close as possible while pushing cannot-link Groups as far as possible. We formulate the learning process with the weights of constraints by invoking some linear and nonlinear transformations. The linear Group-based distance learning method is realized by means of semidefinite programming, and the nonlinear learning method is realized by using the neural network, which can explicitly provide nonlinear mappings. Experimental results based on both synthetic and real-world datasets show that the proposed methods yield much better performance compared to other distance learning methods using pairwise constraints.

Highlights

  • C LUSTERING is a general methodology and a remarkable algorithmic framework for data analytic and interpretation [1]

  • 1) We propose a novel Group-based distance learning method, which learns from Group-level information, to improve the capabilities of fuzzy clustering

  • The running time of information theoretic-based distance learning (ITDL) is less than ours, as referring to the results shown in the above experiments, both the proposed linear Group-based distance learning (LGDL) and NLGDL achieve better performance than

Read more

Summary

Introduction

C LUSTERING is a general methodology and a remarkable algorithmic framework for data analytic and interpretation [1] It aims to partition data into several clusters such that the data located in the same cluster are logically close to each other while the data in different clusters are highly distinct. We can provide some prior knowledge to guide the clustering process in order to obtain precise results being in rapport with the structure existing in data under analysis. This is the main goal of semisupervised clustering [8]. The prior knowledge mainly comprises [9] pairwise constraints (must-links and cannot-links); class labels; clusters’ position or identity; the size of clusters; proximity knowledge [10], [11]; and partition-level information [12], [13]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.