Abstract

The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper, a probability-based learning framework is proposed, which allows both the attribute weights and the center-based clusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is then derived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidth selection problem. We show that the attribute weight substantially connects to the kernel bandwidth, while the optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes. Experimental results on synthesis and real-world data show outstanding performance of the proposed method, which significantly outperforms state-of-the-art algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.