Abstract

Semi-supervised clustering algorithms are increasingly employed for discovering hidden structure in data with partially labelled patterns. In order to make the clustering approach useful and acceptable to users, the information provided must be simple, natural and limited in number. To improve recognition capability, we apply an effective feature enhancement procedure to the entire data-set to obtain a single set of features or weights by weighting and discriminating the information provided by the user. By taking pairwise constraints into account, we propose a semi-supervised fuzzy clustering algorithm with feature discrimination (SFFD) incorporating a fully adaptive distance function. Experiments on several standard benchmark data sets demonstrate the effectiveness of the proposed method.

Highlights

  • Being one of the most important techniques in pattern recognition, machine learning, data mining and knowledge discovery, clustering is widely used in many application areas to understand and reveal hidden structure of the given patterns

  • We provide comparison with AFCC, which is a typical semi-supervised clustering algorithm relying on pairwise constraints

  • Different from the typical algorithms, such as AFCC, SSKFCM and sSFCM, the proposed algorithm supervised fuzzy clustering algorithm with feature discrimination (SFFD) focuses on learning a Mahalanobis distance metric instead of original Euclidean distance during the fuzzy clustering process

Read more

Summary

Introduction

Being one of the most important techniques in pattern recognition, machine learning, data mining and knowledge discovery, clustering is widely used in many application areas to understand and reveal hidden structure of the given patterns. Semi-supervised clustering integrates the advantages of both, with less human effort, appropriate interaction and adaptable accuracy by taking class labels, prior membership degrees or pairwise constraints into account [1,2,3,4,5,6,7,8,9,10]. The research into semi-supervised clustering can be generally divided into two approaches: hard constraints based and fuzzy based methods. In semi-supervised hard c-means clustering methods [1,2,3,4,5,6,7], the clustering process is under control of class labels or pairwise constraints to make sure that each instance belongs to only one cluster.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.