Abstract
The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale.
Highlights
The total number of microbial cells on earth is huge: approximate estimation of their number is 1030 [1], and the genomes of these vastly unknown microbes might contain a large number of novel genes with very important functions
In this work we have developed the Parallel-META 2.0 package to analyze the metagenomic samples by incorporating several novel functions including improved computational engine based on High-Performance Computing (HPC), enhanced methods for taxonomical structure interpretation, functional analysis, as well as the data visualization techniques
Parallel-META [14] software processes the metagenomic taxonomical analysis with shotgun sequencing data in 3 steps: (1) 16S rRNA extraction: To predict and extract 16S rRNA fragments from the input data by Hidden Markov Model algorithm [21,22] with the HMM model built from all 16S rRNA sequences of Silva Database [23]. (2) 16S rRNA mapping: To identify each component of the microbial community by mapping all 16S rRNA fragments to reference database by parallel computing
Summary
The total number of microbial cells on earth is huge: approximate estimation of their number is 1030 [1], and the genomes of these vastly unknown microbes might contain a large number of novel genes with very important functions. Early metagenomic survey of microbial communities focused on 16S ribosomal RNA sequences which are relatively short, often conserved within a species while different between species. The 16S rRNA-based metagenomic survey has already produced data for analysis of microbial communities of Sargasso Sea [4], acid mine drainage biofilm [5], human gut microbiome [6] and so on. The increasing number of metagenome data analysis projects needs more and more computing power, which becomes an increasingly large huddle for the efficient process of metagenome datasets by current pipelines. The functional analysis of metagenomic data is based on shotgun sequencing data that could elucidate the gene-set, pathway and even regulation network properties and their dynamics for microbial communities.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.