Abstract

BackgroundReducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used.ResultsFlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes.ConclusionsMany of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0532-1) contains supplementary material, which is available to authorized users.

Highlights

  • Reducing the effects of sequencing errors and Polymerase chain reaction (PCR) artifacts has emerged as an essential component in amplicon-based metagenomic studies

  • FlowClus allows one to choose from a number of criteria based on sequences, quality scores, and flowgrams, all of which are in the sff.txt file that FlowClus requires as an input

  • It is important to note that this error rate was artificially deflated, because of the positive 3’ gap of PyroNoise [19], as shown by the increase in sequence information (Figure 3). We analyzed this mock community dataset with the QIIME denoising pipeline, and we found that the error rates through each step were similar to those of FlowClus (Additional file 7)

Read more

Summary

Introduction

Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. They are limited by the size of the dataset and the sequencing technology used. The combined technologies of PCR and next-generation sequencing have allowed for the study of the rare biosphere by obviating the need for culturing or cloning These same advances confound subsequent analysis of the sequence data. Two of the most widely used are AmpliconNoise [6] and the denoising pipeline in QIIME [3], the microbial ecology analysis package [8]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call