A fast [formula omitted] time hybrid clustering algorithm using the circumference proximity based merging technique for diversified datasets

Mohammad Maksood Akhter,Sraban Kumar Mohanty

doi:10.1016/j.engappai.2023.106737

Abstract

Clustering has been widely employed for extracting intrinsic groups because of its low reliance on domain knowledge. Though several clustering techniques have been developed in the literature, the majority of them become inefficient due to their dependence on user-defined parameters or inability to cluster multi-scale datasets. To handle these issues, several hybrid clustering algorithms which combine the advantages of partitional, hierarchical, and graph-based clustering techniques have been developed. Hybrid clustering algorithms first partition the data into smaller clusters and then merge them into genuine clusters. However, most of them incur a quadratic time complexity which limits their application for large datasets. The proposed work presents a two-step fast hybrid clustering algorithm based on partitions and efficient merging to reduce the computational cost by maintaining the clustering quality. In the first step, the dispersion of the data is considered to produce balanced partitions. Then graph circumference proximity-based merging technique is proposed to merge the sub-clusters. The overall computational complexity of the algorithm is O(NlgN) where N is the number of data points, ignoring the dimension. To the best of our knowledge, this is the fastest known graph neighborhood-based hybrid clustering algorithm. Experimental results on various diversified datasets exhibit a significant improvement in the running time as well as cluster quality and robustness against noise.

Full Text