Abstract
High throughput sequencing (HTSeq) of small ribosomal subunit amplicons has the potential for a comprehensive characterization of microbial community compositions, down to rare species. However, the error-prone nature of the multi-step experimental process requires that the resulting raw sequences are subjected to quality control procedures. These procedures often involve an abundance cutoff for rare sequences or clustering of sequences, both of which limit genetic resolution. Here we propose a simple experimental protocol that retains the high genetic resolution granted by HTSeq methods while effectively removing many low abundance sequences that are likely due to PCR and sequencing errors. According to this protocol, we split samples and submit both halves to independent PCR and sequencing runs. The resulting sequence data is graphically and quantitatively characterized by the discordance between the two experimental branches, allowing for a quick identification of problematic samples. Further, we discard sequences that are not found in both branches (“AmpliconDuo filter”). We show that the majority of sequences removed in this way, mostly low abundance but also some higher abundance sequences, show features expected from random modifications of true sequences as introduced by PCR and sequencing errors. On the other hand, the filter retains many low abundance sequences observed in both branches and thus provides a more reliable census of the rare biosphere. We find that the AmpliconDuo filter increases biological resolution as it increases apparent community similarity between biologically similar communities, while it does not affect apparent community similarities between biologically dissimilar communities. The filter does not distort overall apparent community compositions. Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.
Highlights
Amplicon-based environmental high-throughput sequencing (HTSeq) of markers such as SSU rRNA [1] have become a standard in biodiversity research
We show that the majority of sequences removed in this way, mostly low abundance and some higher abundance sequences, show features expected from random modifications of true sequences as introduced by PCR and sequencing errors
We find that the AmpliconDuo filter increases biological resolution as it increases apparent community similarity between biologically similar communities, while it does not affect apparent community similarities between biologically dissimilar communities
Summary
Amplicon-based environmental high-throughput sequencing (HTSeq) of markers such as SSU rRNA [1] have become a standard in biodiversity research. These methods have the potential to settle fundamental controversies about microbial diversity and distribution, including those resulting from the key problem of massive under-sampling of diversity, especially of the rare biosphere [2,3,4,5]. Several problems remain, such as the relative short read lengths, or non-negligible error rates in PCR and sequencing steps The latter potentially lead to overestimation and distortion of microbial biodiversity [10,11,12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.