Abstract

Single-linkage with distance-r stopping condition (SL) is a classical clustering technique, which can discover arbitrary shaped clusters. However, its O(n2) worst-case time complexity is still a challenge, where n is the number of points in the dataset. In this paper, we present a relaxed version of the problem in Euclidean space, called the ρ-approximate SL problem. We propose a grid-based algorithm, called ρ-SL, to solve this problem in O(n+ν) expected time, where ν is an integer associated with the maximum and minimum coordinates of each dimension. ρ-SL first partitions the dataset into non-empty grids and constructs a grid tree to organize them for efficient non-empty neighboring grid queries. Then, an initial cluster structure is established to keep track of all pairs of grids that may be mergeable. ρ-SL proceeds to identify the pairs of grids that are mergeable by a fast merging algorithm which takes into account the spatial relationships among points to reduce unnecessary distance calculations. Experiments were conducted on synthetic and real-world datasets to evaluate the performance of ρ-SL. The results show that the proposed algorithm is more efficient than SL and its fastest variants.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call