Abstract

Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.

Highlights

  • Third-generation, single-molecule, long-read (LR) sequencing is considered to be the frontier of genomics [1], especially in the context of studying microbial populations [2, 3]

  • Due to the differences in annotations, which we found to be exclusive to individual assembly approaches, we subsequently studied the effect of assembler choice on two well-defined, functionally relevant classes of genes: ribosomal RNA and antimicrobial resistance (AMR) genes

  • In addition to our newly generated human-borne multi-omic data (GDB), we used publicly available SR and LR metagenomic data originating from the same respective sample (Zymo, natural whey starter culture (NWC) or rumen sample (Rumen))

Read more

Summary

Introduction

Third-generation, single-molecule, long-read (LR) sequencing is considered to be the frontier of genomics [1], especially in the context of studying microbial populations [2, 3]. Stewart et al [10] recently were among the first to demonstrate the utility of using LRs for improving upon existing protein databases owing to a large collection of novel proteins and enzymes identified, thereby hinting at the benefits of LRs for functional microbiome studies. Single base accuracy of raw LRs remains lower—for — compared with short-read (SR) methodologies [11]; Nanopore LR quality is steadily increasing. The impact of remnant errors in LR assemblies on gene calling and thereby protein prediction was recently highlighted by Watson et al [14]. Watson et al [14] showed that insertions/deletions (indels) play a critical role in microbial protein identification, the overall impact of assembly methods on understanding the functional potential of microbial communities is lacking

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call