Abstract

Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers – computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.

Highlights

  • The increased scientific and practical interest in the microbial world that surrounds us, as well as the emergence of new molecular-biological and bioinformatic approaches for the analysis of the diversity and genetic potential of microbial communities from diverse environments gave rise to what is known as metagenomics.While partial sequence of microbiotal DNA is sufficient to assess the information about the diversity of the sampled community, to uncover the genetic potential, we need to analyze the extended genomic regions, or even better, fully restored genomes from the microbiome

  • We describe the challenges of metagenomic assembly (MA), the role MA played in the revolutionary discoveries of recent years, expanding our knowledge of the microbial world of our planet

  • Contigs and/or scaffolds are not able to represent the result of assembly in its entirety since they cannot account for the important contiguity information: while metagenome assemblers certainly use read pairs to resolve possible repeats, usually the repeat structure of the metagenome is very complex as in addition to the within-species repeats an assembler have to deal with interspecies repeats caused by horizontal gene transfer, conservative regions, and closely-related strains

Read more

Summary

INTRODUCTION

The increased scientific and practical interest in the microbial world that surrounds us, as well as the emergence of new molecular-biological and bioinformatic approaches for the analysis of the diversity and genetic potential of microbial communities from diverse environments gave rise to what is known as metagenomics. Contigs and/or scaffolds are not able to represent the result of assembly in its entirety since they cannot account for the important contiguity information: while metagenome assemblers certainly use read pairs to resolve possible repeats, usually the repeat structure of the metagenome is very complex as in addition to the within-species repeats an assembler have to deal with interspecies repeats caused by horizontal gene transfer, conservative regions, and closely-related strains. Low coverage edges that are typically generated as a result of sequencing errors are removed from the assembly during the graph simplification step using a global threshold on read depth coverage This approach, is not as effective when used with metagenomic assemblies because the selected threshold would need to be used to remove sequencing artifacts, and the edges corresponding to rare strains, while at the same time, preserving edges corresponding to rare species. Contigs generated prior to masking strain-differences in an assembly graph represent genomic fragments of individual strains (strain contigs) and often provide additional long-range information for the reconstruction of a strain-mixture consensus backbone. metaSPAdes uses the following pipeline (see Figure 1; Nurk et al, 2017) that includes two launches of exSPAnder (Prjibelski et al, 2014) repeat resolution algorithm

Generation of strain-contigs
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.