Boosting cluster tree with reciprocal nearest neighbors scoring

Wen-Bo Xie,Zhen Liu,Bin Chen,Jaideep Srivastava

doi:10.1016/j.engappai.2023.107438

Abstract

Clustering plays a pivotal role in knowledge processing, knowledge bases, and expert systems, enabling AI systems to acquire knowledge effectively. Hierarchical clustering, in particular, offers an intelligent approach to represent knowledge hierarchically by transforming raw data into one/multiple tree-shaped components. However, a notable difficulty arises when attempting to pinpoint appropriate representative points within lower levels of the cluster tree. These points are of paramount importance, as they serve as the roots for subsequent aggregation within the upper levels of the cluster tree. Traditional hierarchical clustering algorithms have relied on rudimentary techniques to select these representative points, which may not provide an adequate representation. Consequently, the resulting cluster tree often falls short in terms of empirical performance. To address this shortcoming, we proposed an innovative hierarchical clustering algorithm in this paper. The proposed algorithm is designed to efficiently identify the representative point within each sub-minimum-spanning-tree during the construction of the cluster tree, achieved by topology-based scoring the reciprocal nearest data points. Rigorous testing on UCI datasets has demonstrated the superior clustering accuracy (measured by Rand Index and Normalized Mutual Information) of our proposed algorithm compared to other benchmark algorithms. Further analysis reveals that our algorithm boasts a O(nlogn) time-complexity and a O(logn) space-complexity, indicating its scalability and efficiency in handling large-scale data with minimal time and storage costs. Importantly, our algorithm’s ability to process up to two million data points on a standard personal computer underscores its cost-effectiveness.

Full Text