Abstract
Single-linkage with distance-r stopping condition (SL) is a classical clustering technique, which can discover arbitrary shaped clusters. However, its O(n2) worst-case time complexity is still a challenge, where n is the number of points in the dataset. In this paper, we present a relaxed version of the problem in Euclidean space, called the ρ-approximate SL problem. We propose a grid-based algorithm, called ρ-SL, to solve this problem in O(n+ν) expected time, where ν is an integer associated with the maximum and minimum coordinates of each dimension. ρ-SL first partitions the dataset into non-empty grids and constructs a grid tree to organize them for efficient non-empty neighboring grid queries. Then, an initial cluster structure is established to keep track of all pairs of grids that may be mergeable. ρ-SL proceeds to identify the pairs of grids that are mergeable by a fast merging algorithm which takes into account the spatial relationships among points to reduce unnecessary distance calculations. Experiments were conducted on synthetic and real-world datasets to evaluate the performance of ρ-SL. The results show that the proposed algorithm is more efficient than SL and its fastest variants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.