An Empirical Evaluation of Error Correction Methods and Tools for Next Generation Sequencing Data

Atif Mehmood,Muhammad Usman,Javed Ferzund,Abbas Rehman,Imran Ahmad,Shahzad Ahmed

doi:10.14569/ijacsa.2018.090158

Atif Mehmood, Muhammad Usman + Show 4 more

Open Access

https://doi.org/10.14569/ijacsa.2018.090158

Copy DOI

Abstract

Next Generation Sequencing (NGS) technologies produce massive amount of low cost data that is very much useful in genomic study and research. However, data produced by NGS is affected by different errors such as substitutions, deletions or insertion. It is essential to differentiate between true biological variants and alterations occurred due to errors for accurate downstream analysis. Many types of methods and tools have been developed for NGS error correction. Some of these methods only correct substitutions errors whereas others correct multi types of data errors. In this article, a comprehensive evaluation of three types of methods (k-spectrum based, Multi- sequencing alignment and Hybrid based) is presented which are implemented and adopted by different tools. Experiments have been conducted to compare the performance based on runtime and error correction rate. Two different computing platforms have been used for the experiments to evaluate effectiveness of runtime and error correction rate. The mission and aim of this comparative evaluation is to provide recommendations for selection of suitable tools to cope with the specific needs of users and practitioners. It has been noticed that k-mer spectrum based methodology generated superior results as compared to other methods. Amongst all the tools being utilized, Racer has shown eminent performance in terms of error correction rate and execution time for both small as well as large data sets. In multisequence alignment based tools, Karect depicts excellent error correction rate whereas Coral shows better execution time for all data sets. In hybrid based tools, Jabba shows better error correction rate and execution time as compared to brownie. Computing platforms mostly affect execution time but have no general effect on error correction rate.

Highlights

Gigantic amount of data is originated with the help of generation sequencing technologies at lowest cost and high throughput
Next Generation Sequencing (NGS) demands high-power CPU and various algorithms that can work in parallel mode for bioinformatics studies
Most of the tools and methods focus on removing the substitution errors [3]

Summary

Introduction

Gigantic amount of data is originated with the help of generation sequencing technologies at lowest cost and high throughput. As compared to old generation of sequencing data (the first-generation technology) for example Sanger NGS data faces high challenges of error rate. NGS demands high-power CPU and various algorithms that can work in parallel mode for bioinformatics studies. It needs the spacious memory and execution time for total data that may cause issues for data management. NGS technologies produce different tools such as Illumina and Solid to induce the substitution error, whereas the Roche 454 and Ion torrent create the insertion and deletion error. It is key step to remove the data error before any analysis can be made These errors disturb the accuracy of algorithm it is beneficial to rectify data before analysis to conclude better results in downstream analysis [4]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2018
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

An Empirical Evaluation of Error Correction Methods and Tools for Next Generation Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model
Jiaqi Liu ... Zhimin Li
BMC Genomics | VOL. 21
Jiaqi Liu, et. al.Jiaqi Liu ... Zhimin Li
01 Nov 2020
BMC Genomics | VOL. 21

Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data
Yun Heo ... Deming Chen
-
Yun Heo, et. al.Yun Heo ... Deming Chen
18 Mar 2021
18 Mar 2021

Improved Error Correction of NGS Data
Andrei Stefan Alic
-
Andrei Stefan AlicAndrei Stefan Alic
15 Jul 2016
15 Jul 2016

Performance Optimization of a Parallel Error Correction Tool
Marco Martínez-Sánchez ... Roberto R Expósito
-
Marco Martínez-Sánchez, et. al.Marco Martínez-Sánchez ... Roberto R Expósito
15 Oct 2021
15 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Evaluation of Error Correction Methods and Tools for Next Generation Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications