Abstract

BackgroundMany methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.ResultHere we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.ConclusionCompared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.ReviewersThis article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.

Highlights

  • Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms

  • We have implemented MetaBinG2 program for metagenomic sequence classification

  • The performance of MetaBinG2 was more robust than existing methods for samples with various degrees of unknown genomes and was better as the length of sequencing sequences increases

Read more

Summary

Introduction

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. Metagenomics provides a culture-independent method to study an environment by sequencing the genetic material directly. With the progress of sequencing technologies, some environments such as gut microbiomes have been studied well. In most environments, most microbes are unknown and were ignored in the current studies [2]. Metagenomics analysis of unknown environments may give us Sequence classification is a crucial step in metagenome sequence analysis. The methods for metagenome sequence classification can be divided into two categories: (1) alignment-based methods and (2) composition-based methods. Alignment-based methods can be further divided into seed-and-extend alignment-based method, mapping-based methods and kmer-alignment based methods.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call