ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing

Muhammad Tahir,Muhammad Saud Khan,Zahid Mehmood,Muhammad Sardaraz

doi:10.2174/1574893615999200614171832

Abstract

Aims: To assess the error profile in NGS data, generated from high throughput sequencing machines. Background: Short-read sequencing data from Next Generation Sequencing (NGS) are currently being generated by a number of research projects. Depicting the errors produced by NGS platforms and expressing accurate genetic variation from reads are two inter-dependent phases. It has high significance in various analyses, such as genome sequence assembly, SNPs calling, evolutionary studies, and haplotype inference. The systematic and random errors show incidence profile for each of the sequencing platforms i.e. Illumina sequencing, Pacific Biosciences, 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Ion Torrent sequencing, and Oxford Nanopore sequencing. Advances in NGS deliver galactic data with the addition of errors. Some ratio of these errors may emulate genuine true biological signals i.e., mutation, and may subsequently negate the results. Various independent applications have been proposed to correct the sequencing errors. Systematic analysis of these algorithms shows that state-of-the-art models are missing. Objective: In this paper, an effcient error estimation computational model called ESREEM is proposed to assess the error rates in NGS data. Methods: The proposed model prospects the analysis that there exists a true linear regression association between the number of reads containing errors and the number of reads sequenced. The model is based on a probabilistic error model integrated with the Hidden Markov Model (HMM). Result: The proposed model is evaluated on several benchmark datasets and the results obtained are compared with state-of-the-art algorithms. Conclusions: Experimental results analyses show that the proposed model efficiently estimates errors and runs in less time as compared to others.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics

Lead the way for us

Journal: Current Bioinformatics	Publication Date: Feb 1, 2021
Citations: 3

Similar Papers

Detection of FLT3 Internal Tandem Duplication in Targeted, Short-Read-Length, Next-Generation Sequencing Data
David H Spencer ... Eric J Duncavage
The Journal of Molecular Diagnostics | VOL. 15
David H Spencer, et. al.David H Spencer ... Eric J Duncavage
14 Nov 2012
The Journal of Molecular Diagnostics | VOL. 15

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.
Matheus Sanitá Lima ... David Roy Smith
EMBO reports | VOL. 18
Matheus Sanitá Lima, et. al.Matheus Sanitá Lima ... David Roy Smith
27 Oct 2017
EMBO reports | VOL. 18

Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Smruthy Sivakumar
Cancer Research | VOL. 79
Kyle Chang, et. al.Kyle Chang ... Smruthy Sivakumar
01 Jul 2019
Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Smruthy Sivakumar

Benchmarking variant callers in next-generation and third-generation sequencing analysis.
Surui Pei ... Xue Ren
Briefings in Bioinformatics | VOL. 22
Surui Pei, et. al.Surui Pei ... Xue Ren
23 Jul 2020
Briefings in Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics