DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing.

Byunghan Lee,Tsachy Weissman,Sungroh Yoon,Taesup Moon

doi:10.1371/journal.pone.0181463

Byunghan Lee, Tsachy Weissman + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0181463

Copy DOI

Abstract

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq.

Highlights

A new generation of high-throughput, low-cost sequencing technologies, referred to as nextgeneration sequencing (NGS) technologies [1], is reshaping biomedical research, including large-scale comparative and evolutionary studies [2,3,4]
With the above unique nature of the Discrete Universal DEnoiser (DUDE) algorithm, we show in our experiments that it outperforms other state-of-the-art schemes, for applications to targeted amplicon sequencing
Our experimental results show that DUDE-Seq can robustly outperform k-mer-based, MSAbased, and statistical error model-based schemes for both real-world datasets, such as 454 pyrosequencing and Illumina data, and simulated datasets, for targeted amplicon sequencing

Summary

Introduction

A new generation of high-throughput, low-cost sequencing technologies, referred to as nextgeneration sequencing (NGS) technologies [1], is reshaping biomedical research, including large-scale comparative and evolutionary studies [2,3,4]. Compared with automated Sanger sequencing, NGS platforms produce significantly shorter reads in large quantities, posing various new computational challenges [5]. There are several DNA sequencing methodologies that use NGS [6, 7] including whole genome sequencing (WGS), chromatin immunoprecipitation (ChIP) sequencing, and targeted sequencing. WGS is used to analyze the genome of an organism to capture all variants and identify potential causative variants; it is used for de novo genome assembly. ChIP sequencing identifies genome-wide DNA binding sites for transcription factors and other proteins.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jul 27, 2017
Citations: 57	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

The Extent of Linkage Disequilibrium and Computational Challenges of Single Nucleotide Polymorphisms in Genome-Wide Association Studies
Yao-Ting Huang ... Chia-Jung Chang
Current Drug Metabolism | VOL. 12
Yao-Ting Huang, et. al.Yao-Ting Huang ... Chia-Jung Chang
01 Jun 2011
Current Drug Metabolism | VOL. 12

Will Benchtop Sequencers Resolve the Sequencing Trade-off in Plant Genetics?
Alex D Twyford
Frontiers in Plant Science | VOL. 7
Alex D TwyfordAlex D Twyford
06 Apr 2016
Frontiers in Plant Science | VOL. 7

Different next generation sequencing platforms produce different microbial profiles and diversity in cystic fibrosis sputum
Andrea Hahn ... Marcos Pérez-Losada
Journal of Microbiological Methods | VOL. 130
Andrea Hahn, et. al.Andrea Hahn ... Marcos Pérez-Losada
05 Sep 2016
Journal of Microbiological Methods | VOL. 130

Comparison of two benchtop next generation sequencing platforms for mutation screening in patients with epilepsy
...
-
, et. al. ...
15 Feb 2018
15 Feb 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE