Abstract

This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible -NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.

Highlights

  • K-Nearest neighbor graphs have a variety of applications in bioinformatics [1,2], data mining [3], machine learning [4,5], manifold learning [6], clustering analysis [7], and pattern recognition [8]

  • The k-Nearest Neighbor Graph (k-NNG) problem is similar to the k-NN problem and a k-NNG can be built by repeatedly applying the k-NN query for every object in the input data once a convenient search indexing data structure has been built

  • In this paper we describe our parallelized brute-force k-NNG algorithm on a cluster of graphics processing units

Read more

Summary

Introduction

K-Nearest neighbor graphs have a variety of applications in bioinformatics [1,2], data mining [3], machine learning [4,5], manifold learning [6], clustering analysis [7], and pattern recognition [8]. The k-NNG problem is similar to the k-NN problem and a k-NNG can be built by repeatedly applying the k-NN query for every object in the input data once a convenient search indexing data structure has been built Such search data structures include kd-trees [9], BBD-trees [10], random-projection trees (rp-trees) [11], and hashing based on locally sensitive hash [12]. These method focus on optimizing the k-NN search, i.e., finding k-NNs for a set of query points w.r.t. a set of points with which the search data structure is built, ignoring the fact that every query point is a data point. These methods are generally less efficient compared with one that focuses on k-NNG construction directly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call