Abstract

Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.

Highlights

  • One of the goals of clustering is to mine the internal structure and characteristics of unlabeled data, which is known as unsupervised learning [1,2]

  • The results indicate that our SKSCC algorithm with non-linear similarity measurement does a better job, by considering the relationship of the attributes, than the other algorithms

  • Kernel clustering with categorical data is a vital direction in application research

Read more

Summary

Introduction

One of the goals of clustering is to mine the internal structure and characteristics of unlabeled data, which is known as unsupervised learning [1,2]. Real-world applications, i.e., pattern recognition [3], text mining [4], image retrieval [5], and bioinformatics [6], generate unlabeled data. All of these data are not just numerical data but are increasingly categorical data, which are flooding into practical applications. One example is that political philosophy is often measured as liberal, moderate, or conservative. Another example is that breast cancer diagnoses based on a mammograms use the categories normal, benign, probably benign, suspicious, and malignant

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call