Abstract

Many existing clustering methods usually compute clusters from the reduced data sets obtained by summarizing the original very large data sets. BIRCH is a popular summary-based clustering method that first builds a CF tree, and then performs a global clustering using the leaf entries of the tree. However, to the best of our knowledge, no prior studies have proposed a global clustering method that uses the structure of a CF tree. Therefore, we propose a novel global clustering method ERC (effective multiple range queries-based clustering), which takes advantage of the structure of a CF tree. We further propose a CF $^+$ + tree, which optimizes the node split scheme used in the CF tree. As a result, the CF $^+$ + -ERC (CF $^+$ + tree-based ERC) method effectively computes clusters over large data sets. Furthermore, it does not require a predefined number of clusters to compute the clusters. We present in-depth theoretical and experimental analyses of our method. Experimental results on very large synthetic data sets demonstrate that the proposed approach is effective in terms of cluster quality and robustness and is significantly faster than existing clustering methods. In addition, we apply our clustering method to real data sets and achieve promising results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call