ADEPT, a dynamic next generation sequencing data error-detection program with trimming.

Shihai Feng,Chien-Chi Lo,Po-E Li,Patrick S G Chain

doi:10.1186/s12859-016-0967-z

Abstract

BackgroundIllumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery.ResultsIn this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads.ConclusionsADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0967-z) contains supplementary material, which is available to authorized users.

Highlights

ResultsWe present A Dynamic Error-detection Program with Trimming (ADEPT), a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run
Illumina is the most widely used generation sequencing technology and produces millions of short reads that contain errors
Most sequencing technologies come with software that assign quality scores to each nucleotide as a means to estimate the probability of there being an error at that position, and does so by using a measurement on the platform

Summary

Results

We present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, within the middle of reads

Conclusions

Background

Results and discussions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 29, 2016
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ADEPT, a dynamic next generation sequencing data error-detection program with trimming.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Application of SNP technologies in medicine: lessons learned and future challenges.
Eric Lai
Genome Research | VOL. 11
Eric LaiEric Lai
01 Jun 2001
Genome Research | VOL. 11

CnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data.
Pubudu Saneth Samarakoon ... Torbjørn Rognes
BMC Genomics | VOL. 17
Pubudu Saneth Samarakoon, et. al.Pubudu Saneth Samarakoon ... Torbjørn Rognes
14 Jan 2016
BMC Genomics | VOL. 17

Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.
Dong-Won Seo ... Shil Jin
Molecular biology reports | VOL. 42
Dong-Won Seo, et. al.Dong-Won Seo ... Shil Jin
11 Oct 2014
Molecular biology reports | VOL. 42

Erratum to: Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.
Dong-Won Seo ... Hee-Bok Park
Molecular Biology Reports | VOL. 42
Dong-Won Seo, et. al.Dong-Won Seo ... Hee-Bok Park
15 Jan 2015
Molecular Biology Reports | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ADEPT, a dynamic next generation sequencing data error-detection program with trimming.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics