Abstract

The 2-body correlation function (2-BCF) is a group of statistical measurements that found applications in many scientific domains. One type of 2-BCF named the Spatial Distance Histogram (SDH) is of vital importance in describing the physical features of natural systems. While a naive way of computing SDH requires quadratic time, efficient algorithms based on resolving nodes in spatial trees have been developed. A key decision in the design of such algorithms is to choose a proper underlying data structure: our previous work utilizes quad-tree (oct-tree for 3-dimensional data) and in this paper we propose a kd-tree-based solution. Although it is easy to see that both implementations have the same time complexity O(N2d−1/d), where d is the number of dimensions of the dataset, a thorough comparison of their actual running time under different scenarios is conducted. In particular, we present an analytical model to rigorously quantify the running time of dual-tree algorithms. Our analysis suggests that the kd-tree-based implementation outperforms the quad-/oct-tree solution under all scenarios with different data sizes and query parameters. In particular, such performance advantage is shown as a speedup up to 1.23X over the quad-tree algorithm for 2D data. Results of extensive experiments run on synthetic and real datasets confirm our findings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call