NMF-mGPU: non-negative matrix factorization on multi-GPU systems

Edgardo Mejía-Roa,Daniel Tabas-Madrid,Francisco Tirado,Carlos García,Javier Setoain,Alberto Pascual-Montano

doi:10.1186/s12859-015-0485-4

Edgardo Mejía-Roa, Daniel Tabas-Madrid + Show 4 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-015-0485-4

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundIn the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster.In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU.Results NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA’s framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system’s main memory to the GPU’s memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup).ConclusionsApplications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used “out of the box” by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu.

Highlights

In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets
Performance Our experimental evaluations have been performed on an Intel workstation equipped with four NVIDIA Tesla C1060 Graphics-Processing Units (GPUs) connected via a PCI Express bus
It is important to mention that, our tests were performed on old GPU devices with an old CUDA release, all results are still valid for newer hardware/software systems

Summary

Introduction

In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations These devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. NMF decomposes a large input dataset into a small set of highly interpretable and additive parts This property has centered the attention of scientists for a wide range of applications in Bioinformatics, such as gene-expression analysis [6,7], scientific literature mining [8], neuroscience [9], and several -omics techniques [10,11]. Face and object recognition [5,13], color science [14], computer vision [15], polyphonic music transcription [16], as well as other signal-processing methods [17,18]

Objectives

Results

Conclusion