We present improved data parallel approaches working on graphics processing unit (GPU) compute unified device architecture (CUDA) platform to build hierarchical Euclidean minimum spanning forest or tree (EMSF/EMST) for applications whose input only contains N points with arbitrary data distribution in 2D/3D Euclidean space. Characteristic of the proposed parallel algorithms follows “data parallelism, decentralized control and O(1) local memory occupied by each GPU thread”. This research has to solve GPU parallelism of component-based nearest neighbor search (component-based NNS), tree traversal, and other graph operations like union-find. For exact NNS, instead of using classical K-d tree search or brute-force computing method, we propose a K-d search method working based on dividing the Euclidean K-dimensional space into congruent and non-overlapping square/cubic cells where size of points in each cell is bounded. For component-based NNS, with the uniqueness property based on 2D/3D square/cubic space partition, we propose dynamic and static pruning techniques to prune unnecessary neighbor cells’ search. For tree traversal, instead of using breadth-first-search, this paper proposes CUDA kernels working with a distributed dynamic link list for selecting a local spanning tree’s shortest outgoing edge since size of local EMSTs in EMSF cannot be predicted. Source code is provided online and experimental comparisons are conducted on both 2D and 3D benchmarks with up to 107 points to build final EMST. Results show that applying K-d search with static pruning technique and the proposed operators totally working in parallel on GPU, our current implementation runs faster than our previous work and current optimal sequential dual-tree mlpack EMST library.
Read full abstract