Abstract

Standard k-means clustering necessitates computing pairwise Euclidean distances between each instance x in a data set D and all cluster centers, resulting in inadequate efficiency when dealing with high-dimensional data sets. Given its widespread usage, it is imperative that k-means clustering should be performed quickly to ensure efficient solutions. This paper is dedicated to exploring ways to improve the efficiency of the k-means algorithm in high-dimensional space. Unlike approximated approaches, our proposed method LBKC can achieve acceleration while yielding clustering results that are the same as what standard k-means clustering generates. LBKC utilizes the lower bound of Euclidean distance to safely avoid a large number of unnecessary distance calculations, thus achieving the goal of accelerating k-means process. Three carefully designed lower bounds based on the block vector, segment mean, and nonlinear embedding are presented in this paper, and they are employed in the proposed method. Furthermore, our approach LBKC is orthogonal to state-of-the-art methods, and we show how LBKC can be naturally combined with them to further improve their performance. Comprehensive experiments are conducted on a variety of data sets to evaluate the performance of the proposed approaches and related competitors, and the experimental results verify the effectiveness of our proposals.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.