GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

Ahmed Shamsul Arefin,Carlos Riveros,Regina Berretta,Pablo Moscato,Alexandre G De Brevern

doi:10.1371/journal.pone.0044000

Abstract

BackgroundThe analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem.ResultsWe propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets.ConclusionOur GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/.

Highlights

The analysis of biological networks is an important task for gaining insights into the massive amount of data generated by high-throughput technologies
The computational tests are performed on following hardware setup: a total of four NVIDIA Tesla C2050 GPU cards are installed on a X8DTG-Q Supermicro server that has 2| Intel Xeon E5620 2.4GHz processors, 32GB of 1066 MHz DDR3 RAM and 800GB of Local Hard Disk
The published clinical data gives the clinical metastasis; we consider this as a phenotypical dummy probe and keep it as a row in the input matrix

Summary

Introduction

The analysis of biological networks is an important task for gaining insights into the massive amount of data generated by high-throughput technologies (e.g., microarrays). For a set R of m reference points and a set Q of n query points in a d-dimensional space, the kNN search problem identifies the k-nearest neighbours of each query point q[Q in the reference set R, given a distance metric [1]. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers) An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Aug 28, 2012
Citations: 95	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Fast network centrality analysis using GPUs.
Zhiao Shi ... Bing Zhang
BMC Bioinformatics | VOL. 12
Zhiao Shi, et. al.Zhiao Shi ... Bing Zhang
12 May 2011
BMC Bioinformatics | VOL. 12

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching
Vincent Garcia ... Eric Debreuve
-
Vincent Garcia, et. al.Vincent Garcia ... Eric Debreuve
01 Sep 2010
01 Sep 2010

Algorithm Engineering for High-Dimensional Similarity Search Problems (Invited Talk)

-

02 Jul 2020
02 Jul 2020

Large scale nearest neighbor search -- theories, algorithms, and applications
...
-
, et. al. ...
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE