Effects of error-correction of heterozygous next-generation sequencing data.

M Stanley Fujimoto,Mark J Clement,Nozomu Okuda,Quinn Snell,Paul M Bodily

doi:10.1186/1471-2105-15-s7-s3

M Stanley Fujimoto, Mark J Clement + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-15-s7-s3

Copy DOI

Journal: BMC Bioinformatics	Publication Date: May 1, 2014
Citations: 18	License type: CC BY 2.0

Affiliation: Brigham Young University

Abstract

BackgroundError correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes.ResultsQuake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers.Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly.ConclusionsThese findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO.

Highlights

Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use
Synthetic datasets Quake Both haploid and diploid genome sizes of the genome were used when calculating the appropriate k for Quake
When correcting the datasets with all reads for a particular error and heterozygosity rate combined and using the diploid and the haploid genome sizes as parameters, the error-corrected reads showed several of the same general trends for the first three error rates for errors at heterozygous positions

Summary

Introduction

Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. The prevalence of next-generation sequencing (NGS) has increased throughput for generating genomic data and our ability to perform genomic analysis. A fragment with all bases identified, whether correctly or incorrectly, forms a read [1]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effects of error-correction of heterozygous next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data
Qian Liu ... Qianqian Zhu
Genomics, Proteomics & Bioinformatics | VOL. 17
Qian Liu, et. al.Qian Liu ... Qianqian Zhu
01 Apr 2019
Genomics, Proteomics & Bioinformatics | VOL. 17

A Large-Scale and Serverless Computational Approach for Improving Quality of NGS Data Supporting Big Multi-Omics Data Analyses.
Dariusz Mrozek ... Krzysztof Stępień
Frontiers in genetics | VOL. 12
Dariusz Mrozek, et. al.Dariusz Mrozek ... Krzysztof Stępień
13 Jul 2021
Frontiers in genetics | VOL. 12

Using geometric structures to improve the error correction algorithm of high-throughput sequencing data on MapReduce framework
Wei-Chun Chung ... Yu-Jung Chang
-
Wei-Chun Chung, et. al.Wei-Chun Chung ... Yu-Jung Chang
01 Oct 2014
01 Oct 2014

Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson
Cancer Research | VOL. 81
Chandra Sekhar Pedamallu, et. al.Chandra Sekhar Pedamallu ... Donald Jackson
01 Jul 2021
Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effects of error-correction of heterozygous next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics