Abstract

The k-means is the most popular clustering algorithm, but, as it needs too many distance computations, its speed is dramatically fall down against high-dimensional data. Although, there are some quite fast variants proposed in literature, but, there is still much room for improvement against high-dimensional large-scale datasets.What proposed here, k-means-g*, is based on a simple geometric concept. For four distinct points, if distance between all pairs except one pair are known, then, a lower bound can be determined for the unknown distance. Utilizing this technique in the assignment step of the k-means, many high-dimensional distance computations can be easily ignored, where small amount of memory is used.Both theoretical and experimental results approves speed of the k-means-g* against recently published fast variants. For over than 50 cases, out of 70 cases of performed experiments, it is faster than other algorithms. The C++ source code for k-means-g* is publicly available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.