Abstract

BackgroundMetagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies.ResultsThis paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs.ConclusionThis work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.

Highlights

  • Metagenomics is the study of environmental samples using sequencing

  • This work shows that paired reads perform better than single reads, as expected, and, perhaps slightly less obviously, that long clones allow more specific assignments than short ones

  • A new version of the program MEGAN that explicitly takes paired reads into account is available from our website

Read more

Summary

Introduction

Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. The analysis of metagenomic datasets is an immense conceptual and computational challenge, and there is a great need for new bioinformatics tools and methods. This has so far, largely escaped the notice of the bioinformatics community. A further task is to compare the contents of different metagenomic datasets The difficulty of these challenges stems from the huge amounts of data to be processed, the poor sampling of reference sequences, the lack of adequate models for data acquisition and the demands of statistical analysis

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.