MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification

Peng Jia,Chaochun Wei,Liming Xuan,Lei Liu,Jonathan H Badger

doi:10.1371/journal.pone.0025353

Abstract

Metagenomic sequence classification is a procedure to assign sequences to their source genomes. It is one of the important steps for metagenomic sequence data analysis. Although many methods exist, classification of high-throughput metagenomic sequence data in a limited time is still a challenge. We present here an ultra-fast metagenomic sequence classification system (MetaBinG) using graphic processing units (GPUs). The accuracy of MetaBinG is comparable to the best existing systems and it can classify a million of 454 reads within five minutes, which is more than 2 orders of magnitude faster than existing systems. MetaBinG is publicly available at http://cbb.sjtu.edu.cn/~ccwei/pub/software/MetaBinG/MetaBinG.php.

Highlights

The culture-independent metagenomics methods try to sequence all genetic materials recovered directly from an environment
We present a fast metagenomic sequence classification system (MetaBinG) using the power of graphic processing units (GPUs)
In order to compare the performance of MetaBinG and Phymm, 1212 fully sequenced bacterial genomes were downloaded from the NCBI FTP site on 14 December 2010

Summary

Introduction

The culture-independent metagenomics methods try to sequence all genetic materials recovered directly from an environment. When the source genome is fully sequenced, alignment-based methods are accurate in general. The test datasets were classified using MetaBinG trained from the 468 training genomes with K = 5 (We observed that 5th-order Markov model was enough to get accurate results).

Results

Conclusion