Abstract

There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce

Highlights

  • In microbiome analysis, 16S rRNA amplicon sequencing [1] is often used when a high-level analysis of taxonomic content suffices, and/or computational resources are limited

  • With MEGAN Community Edition (CE), we provide a highly efficient program for interactive analysis and comparison of such data, allowing one to explore hundreds of samples and billions of reads

  • While taxonomic profiling is performed based on the NCBI taxonomy, we provide a number of different functional profiling approaches such as SEED, eggNOG, KEGG, and a new InterPro2GO

Read more

Summary

Introduction

16S rRNA amplicon sequencing [1] is often used when a high-level analysis of taxonomic content suffices, and/or computational resources are limited. We recently published a new alignment tool called DIAMOND [10] that is able to align short metagenomic sequencing reads against the NCBI-nr database at 20000 times the speed of BLASTX without loss of sensitivity. This makes it possible to analyze large metagenome samples with little computational effort. Alignment of the permafrost data against the NCBI-nr database (containing over 60 million reference sequences) takes about one day on a single server with 32 cores

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.