Abstract

High dimensional clustering is often encountered in real application and projective clustering is an effective way to deal with high dimensional clustering problems aiming to capture the dense areas embedded in subsets of attributes/subspaces. Most projective clustering algorithms use equal or varying width hyper-rectangle structure to identify the dense areas and their locations. Therefore, it is a crucial task to decide the widths of these hyper-rectangle structures in projective clustering. Naturally, making use of the real data distribution directly to determine the widths of the dense structures is a promising and feasible approach. In this paper, we propose a projective clustering algorithm based on hyper-rectangle structure, whose width is estimated from the kernel distribution of real data. In particular, we first define a structure called Significant Local Dense Area (SLDA) structure by using an efficient kernel density estimator, Rodeo; and then design a greedy search method to find the whole SLDAs covered the data distribution in the high-dimensional space; eventually, we run a single-linkage clustering algorithm on the SLDAs to form the final clusters and identify the outliers. The main strength of the proposed algorithm is validated by the experiments on synthetic and real world data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.