Abstract

Discovery of novel diversity in high-throughput sequencing studies is an important aspect in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on the discovery of novel diversity, we clustered an environmental marine high-throughput sequencing dataset of protist amplicons together with reference sequences from the taxonomically curated Protist Ribosomal Reference (PR2) database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The potentially novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and in the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as potentially novel by USEARCH and Swarm were more than 97% similar to references of PR2. Using shortest path analyses on sequence similarity network OTUs and Swarm OTUs we found additional novel diversity within OTUs that would have gone unnoticed without further exploiting their underlying network topologies. These results demonstrate that graph theory provides powerful tools for microbial ecology and the analysis of environmental high-throughput sequencing datasets. Furthermore, sequence similarity networks were most accurate in delineating novel diversity from previously discovered diversity.

Highlights

  • High-throughput sequencing technologies have fundamentally changed our perceptions and concepts of environmental protist diversity (Amaral-Zettler et al, 2009; De Vargas et al, 2015; Logares et al, 2014; Massana et al, 2015; Stoeck et al, 2009)

  • Sequence similarity networks produced the fewest operational taxonomic units (OTUs) containing both environmental and reference amplicons (n = 1,619), containing exclusively reference amplicons (n = 3,138), and containing exclusively environmental amplicons (n = 3,445). This approach was especially effective in linking environmental and reference amplicons: it had the most amplicons in OTUs containing both types (n = 253,965 environmental and n = 54,988 reference). This led to fewer amplicons in exclusively environmental OTUs (n = 47,116), meaning that sequence similarity networks reported the least novel diversity in terms of both amplicons and OTUs

  • These differences in OTU numbers may be due in part to how the two methods use their global clustering values: while connected components in sequence similarity networks grow iteratively, OTUs in USEARCH are restricted to a maximum radius

Read more

Summary

Introduction

High-throughput sequencing technologies have fundamentally changed our perceptions and concepts of environmental protist diversity (Amaral-Zettler et al, 2009; De Vargas et al, 2015; Logares et al, 2014; Massana et al, 2015; Stoeck et al, 2009). The detection of novel diversity, in specific, is often based on sequence similarity. Novel reads are identified by having a low similarity to previously sequenced reference taxa (e.g., Berney et al, 2013; Dunthorn et al, 2014b; Edgcomb et al, 2011b; Filker et al, 2014; Gimmler & Stoeck, 2015; Hartikainen et al, 2014). Following this strategy, groups of sequences that contain both environmental reads and. While the detection and description of novel protists is a central task, our ability to detect novel diversity in molecular environmental studies is affected by the way reads are clustered into operational taxonomic units (OTUs)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call