Abstract
Cluster analysis is the process of partitioning a set of data objects into subsets, each subset is a cluster, so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Partitioning methods in clustering start from an initial partitioning and gain the optimal partition by applying the iterative relocation technique. Partition clustering results depend heavily on the selection of initial cluster centers. Traditional distance-based initialization methods become inefficient because of the inherent sparsity in high-dimensional data and the curse of dimensionality, while existing improved methods are very sensitive to parameters. Based on these, we propose a new initialization method for high-dimensional partition clustering, which can choose high-density and low-similarity initial cluster centers and identify outliers according to its local structure in high-dimensional space adaptively. The experiments on both synthetic and real-world datasets show that the proposed algorithm can achieve better performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.