Abstract
Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described.
Highlights
Metagenomics based on high-throughput sequencing (HTS) helps biologists unveil a large part of the constitutive microorganisms of a microbiota
Computational approaches to analyze targeted metagenomics data have been developed in parallel with the popularization of this new application
The first tools like DOTUR (Schloss, 2005) clustered sequences into Operational Taxonomic Unit (OTU) based on the genetic distances between sequences
Summary
Metagenomics based on high-throughput sequencing (HTS) helps biologists unveil a large part of the constitutive microorganisms of a microbiota. Shotgun metagenomics usually considers the entire genomic content of a sample, by extracting and sequencing the total DNA. As a result, this comprehensive approach offers a rich picture of a microbiota, and provides the opportunity to simultaneously explore the taxonomic and functional diversity of microbial communities [6]. Shotgun metagenomics is still very expensive and the data analysis is a challenging task, due both to the size and the complex structure of the data [7] This is a significant obstacle to common applications
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.