Abstract

Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities in nature. While strategies exist for optimizing sample and sequence library preparation, best practices for bioinformatic processing of amplicon sequence data are lacking in animal diet studies. Here we evaluate how decisions made in core bioinformatic processes, including sequence filtering, database design, and classification, can influence animal metabarcoding results. We show that denoising methods have lower error rates compared to traditional clustering methods, although these differences are largely mitigated by removing low‐abundance sequence variants. We also found that available reference datasets from GenBank and BOLD for the animal marker gene cytochrome oxidase I (COI) can be complementary, and we discuss methods to improve existing databases to include versioned releases. Taxonomic classification methods can dramatically affect results. For example, the commonly used Barcode of Life Database (BOLD) Classification API assigned fewer names to samples from order through species levels using both a mock community and bat guano samples compared to all other classifiers (vsearch‐SINTAX and q2‐feature‐classifier's BLAST + LCA, VSEARCH + LCA, and Naive Bayes classifiers). The lack of consensus on bioinformatics best practices limits comparisons among studies and may introduce biases. Our work suggests that biological mock communities offer a useful standard to evaluate the myriad computational decisions impacting animal metabarcoding accuracy. Further, these comparisons highlight the need for continual evaluations as new tools are adopted to ensure that the inferences drawn reflect meaningful biology instead of digital artifacts.

Highlights

  • Metabarcoding of animal diets has fundamentally changed our insights into what species are eating, expanding our understanding of dietary diversity and food web complexity (Clare, 2014; Pompanon et al, 2012; Symondson, 2002; Valentini, Pompanon, & Taberlet, 2009)

  • We found that available reference datasets from GenBank and Barcode of Life Database (BOLD) for the animal marker gene cytochrome oxidase I (COI) can be complementary, and we discuss methods to improve existing databases to include versioned releases

  • We explored how the BOLD classification engine would compare to the other classifiers, but this comparison was limited because their classification parameters are not publicly documented, nor is the specific database used for classification defined

Read more

Summary

| INTRODUCTION

Metabarcoding of animal diets has fundamentally changed our insights into what species are eating, expanding our understanding of dietary diversity and food web complexity (Clare, 2014; Pompanon et al, 2012; Symondson, 2002; Valentini, Pompanon, & Taberlet, 2009). Few studies have used real data (i.e., actual diet samples) to offer insights into the effects of sequencing platforms (Divoll, Brown, Kinne, McCracken, & O'Keefe, 2018) or abundance filtering parameters (Alberdi et al, 2018) We build upon these analytical considerations by using real and biological mock data to illustrate how both software choice and subsequent filtering criteria impact the interpretation of community richness and composition, a common focus of diet analyses. Mock communities can provide an empirically derived filtering strategy and assess the likelihood and relative abundances of unexpected sequences (Palmer et al, 2018) We assessed these sequence processing and classification methods using four libraries of COI data generated from an ongoing bat diet study that included a biological mock community sample and hundreds of bat guano samples for each sequencing run. Fasta files containing the sequence and taxonomic information for each mock sample are available on our GitHub repo https://github.com/devonorourke/ tidybug/tree/master/data/mock_community

| MATERIALS AND METHODS
96 Palmer 112 tidybug 90 60 30 0 Unique Order names
| DISCUSSION
Findings
| CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call