Implications of Pyrosequencing Error Correction for Biological Data Interpretation

Matthew G Bakker,Zheng J Tu,James M Bradeen,Linda L Kinkel

doi:10.1371/journal.pone.0044357

Matthew G Bakker, Zheng J Tu + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0044357

Copy DOI

Journal: PLoS ONE	Publication Date: Aug 30, 2012
Citations: 40	License type: CC BY 4.0

Affiliation: University of Minnesota

Abstract

There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.

Highlights

Current DNA sequencing capacity offers the opportunity to study microbial communities in unprecedented detail
Quality standards often lag behind technical innovation and many early studies of microbial communities using second generation sequencing appear to have substantially overestimated microbial diversity [1]
One approach to dealing with the problem of sequence error has been to shed detail from a dataset until there is a high probability that the influence of PCR or sequencing errors has been removed, for example with the use of broad criteria for delimiting operational taxonomic units (OTUs) or in approaches that discard all of the leastfrequently occurring sequence variants [5]

Summary

Introduction

Current DNA sequencing capacity offers the opportunity to study microbial communities in unprecedented detail. High sequencing accuracy can be achieved by removing reads that are most likely to contain errors, low error rates may still accumulate to substantial effect in datasets with hundreds of thousands (or more) sequence reads. The AmpliconNoise program [6] was reported as such a method for pyrosequencing error detection and correction and was quickly incorporated into the major processing pipelines for pyrosequencing data [7,8]. A great deal of the effort given to evaluating sequence-based methodologies has been given to reducing OTU inflation and forming the correct number of OTUs. most experimental studies aim to do much more than derive simple richness estimates, and the interpretive impacts of a choice of data processing pipeline are likely to be broader than this single, predominant criterion. Compared to simple constructed communities often used in evaluating new methods, there are many more opportunities for interpretations to shift when the work concerns

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implications of Pyrosequencing Error Correction for Biological Data Interpretation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes.
...
Genome Research | VOL. 31
, et. al. ...
03 Sep 2021
Genome Research | VOL. 31

Untangling the assembly of littoral macroinvertebrate communities through measures of functional and phylogenetic alpha diversity
Jani Heino ... Kimmo T Tolonen
Freshwater Biology | VOL. 62
Jani Heino, et. al.Jani Heino ... Kimmo T Tolonen
20 Apr 2017
Freshwater Biology | VOL. 62

Enhancing Interpretability of Gene Signatures with Prior Biological Knowledge.
Margherita Squillario ... Annalisa Barla
Microarrays | VOL. 5
Margherita Squillario, et. al.Margherita Squillario ... Annalisa Barla
08 Jun 2016
Microarrays | VOL. 5

Building a Robust, Densely-Sampled Spider Tree of Life for Ecosystem Research
Nuria Macías-Hernández ... Jesús Lozano-Fernandez
Diversity | VOL. 12
Nuria Macías-Hernández, et. al.Nuria Macías-Hernández ... Jesús Lozano-Fernandez
23 Jul 2020
Diversity | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implications of Pyrosequencing Error Correction for Biological Data Interpretation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE