Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

Daniel P Dacey,Frédéric J J Chain

doi:10.1186/s12859-021-04410-2

Abstract

BackgroundTaxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases.ResultsThe addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes.ConclusionsMerged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.

Highlights

Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis
Each pipeline generated a list of amplicon sequence variants (ASVs) for each mock, which were taxonomically identified to the bacterial genus level
Using the list of genera detected from each pipeline, the presence and absence of each mock community bacterial member were used to determine the number of genera that were true positives (TPs), false positives (FPs), and false negatives (FNs)

Summary

Introduction

Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Trimming 16S rRNA gene sequences by length might be preferable over quality trimming for unbiased clustering operational taxonomic units (OTUs) [17], and it reduces the number of amplicon sequence variants (ASVs) by eliminating length variation. In both cases though, trimming might cause a loss of informative base pairs necessary for merging paired reads or important for distinguishing between closely related taxa

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 12, 2021
Citations: 21	License type: open-access

R Discovery Prime

R Discovery Prime

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags.
Xianfeng Chen ... Stephen Johnson
GigaScience | VOL. 7
Xianfeng Chen, et. al.Xianfeng Chen ... Stephen Johnson
15 Dec 2017
GigaScience | VOL. 7

Do-it-Yourself Mock Community Standard for Multi-Step Assessment of Microbiome Protocols.
Joanna Colovas ... Ari Fina Bintarti
Current Protocols | VOL. 2
Joanna Colovas, et. al.Joanna Colovas ... Ari Fina Bintarti
01 Sep 2022
Current Protocols | VOL. 2

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
Jonathan L Golob ... Elisa Margolis
BMC Bioinformatics | VOL. 18
Jonathan L Golob, et. al.Jonathan L Golob ... Elisa Margolis
30 May 2017
BMC Bioinformatics | VOL. 18

An in-depth evaluation of metagenomic classifiers for soil microbiomes
Niranjana Rose Edwin ... Orla O’Sullivan
Environmental Microbiome | VOL. 19
Niranjana Rose Edwin, et. al.Niranjana Rose Edwin ... Orla O’Sullivan
28 Mar 2024
Environmental Microbiome | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics