Abstract

Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.

Highlights

  • Metabarcoding is a rapidly growing approach that provides promising opportunities to explore biological diversity in great depth

  • The most stringent workflow (USEARCH filtering, singletons removed, UPARSE clustering) recovered 60 operational taxonomic units (OTUs) whereas the most relaxed combination (RDP filtering, singletons included, UCLUST clustering with each gap identity definition) recovered 5068 OTUs

  • This workflow recovered the highest OTU numbers (262 and 263 OTUs, respectively) due to the combination of RDP filtering, which does not trim sequences to a uniform length, and the each gap definition, in which each nucleotide in a gap contributes to sequence divergence during clustering

Read more

Summary

Introduction

Metabarcoding is a rapidly growing approach that provides promising opportunities to explore biological diversity in great depth. The technique combines taxonomic identification via DNA barcoding (Hebert et al 2003) with the application of high-throughput sequencing technology to identify multiple taxa in complex biological assemblages. Data processing for a metabarcoding study can be a daunting task for ecologists who wish to identify the species present in a sample, and even for bioinformaticians trying to validate their methods (McPherson 2009). In order to estimate species diversity in a complex sample, sequences are clustered into operational taxonomic units (OTUs), which are used as a proxy for species. Diversity estimates can vary greatly depending on the methods used (Bachy et al 2013; Egge et al 2013), and robust assessments of various methods are valuable to guide the selection of optimal procedures for a particular study

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call