Abstract

Metagenomic sequence classification is a procedure to assign sequences to their source genomes. It is one of the important steps for metagenomic sequence data analysis. Although many methods exist, classification of high-throughput metagenomic sequence data in a limited time is still a challenge. We present here an ultra-fast metagenomic sequence classification system (MetaBinG) using graphic processing units (GPUs). The accuracy of MetaBinG is comparable to the best existing systems and it can classify a million of 454 reads within five minutes, which is more than 2 orders of magnitude faster than existing systems. MetaBinG is publicly available at http://cbb.sjtu.edu.cn/~ccwei/pub/software/MetaBinG/MetaBinG.php.

Highlights

  • The culture-independent metagenomics methods try to sequence all genetic materials recovered directly from an environment

  • We present a fast metagenomic sequence classification system (MetaBinG) using the power of graphic processing units (GPUs)

  • In order to compare the performance of MetaBinG and Phymm, 1212 fully sequenced bacterial genomes were downloaded from the NCBI FTP site on 14 December 2010

Read more

Summary

Introduction

The culture-independent metagenomics methods try to sequence all genetic materials recovered directly from an environment. When the source genome is fully sequenced, alignment-based methods are accurate in general. The test datasets were classified using MetaBinG trained from the 468 training genomes with K = 5 (We observed that 5th-order Markov model was enough to get accurate results).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call