Approximate minimum spanning tree clustering in high-dimensional space

Chih Lai,Taras Rafa,Dwight E Nelson

doi:10.3233/ida-2009-0382

Abstract

Minimum spanning tree (MST) clustering sequentially inserts the nearest points in the R$^{d}$ space into a list which is then divided into clusters by using desired criteria. This insertion order, however, can be relaxed provided approximately nearby points in a condensed area are adjacently inserted into a list before distant points in other areas. Based on this observation, we propose an approximate clustering method in which a new Approximate MST (AMST) is repeatedly built in the maximum (d+1) iterations from two sources: a new Hilbert curve created from carefully shifted N data points, and a previous AMST which holds cumulative vicinity information derived from earlier iterations. Although the final AMST may not completely match to a true MST built from an $O(N^{2})$ algorithm, most mismatches occur locally within individual data groups which are unimportant for clustering. Our experiments on synthetic datasets and animal motion vectors extracted from surveillance videos show that high-quality clusters can be efficiently obtained from this approximation method.

Full Text