SMusket: Spark-based DNA error correction on distributed-memory systems

Roberto R Expósito,Jorge González-Domínguez,Juan Touriño

doi:10.1016/j.future.2019.10.038

Roberto R Expósito, Jorge González-Domínguez + Show 1 more

Open Access

https://doi.org/10.1016/j.future.2019.10.038

Copy DOI

Journal: Future Generation Computer Systems	Publication Date: Oct 31, 2019
Citations: 7	License type: other-oa

Affiliation: University of A Coruña

Abstract

Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction in NGS datasets is considered an important preprocessing step in many workflows as sequencing errors can severely affect the quality of downstream analysis. Although current error correction approaches provide reasonably high accuracies, their computational cost can be still unacceptable when processing large datasets. In this paper we propose SparkMusket (SMusket), a Big Data tool built upon the open-source Apache Spark cluster computing framework to boost the performance of Musket, one of the most widely adopted and top-performing multithreaded correctors. Our tool efficiently exploits Spark features to implement a scalable error correction algorithm intended for distributed-memory systems built using commodity hardware. The experimental evaluation on a 16-node cluster using four publicly available datasets has shown that SMusket is up to 15.3 times faster than previous state-of-the-art MPI-based tools, also providing a maximum speedup of 29.8 over its multithreaded counterpart. SMusket is publicly available under an open-source license at https://github.com/rreye/smusket.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SMusket: Spark-based DNA error correction on distributed-memory systems

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.
Isaac Akogwu ... Nan Wang
Human Genomics | VOL. Suppl 10 2
Isaac Akogwu, et. al.Isaac Akogwu ... Nan Wang
01 Jul 2016
Human Genomics | VOL. Suppl 10 2

Next Generation Sequencing Technologies and Their Applications
Ku Chee‐Seng ... Loy En Yun
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Loy En Yun
19 Apr 2010
19 Apr 2010

Exploring the feasibility of next-generation sequencing and microarray data meta-analysis
Po-Yen Wu ... M D Wang
-
Po-Yen Wu, et. al. Po-Yen Wu ... M D Wang
01 Aug 2011
01 Aug 2011

EasyQC: Tool with Interactive User Interface for Efficient Next-Generation Sequencing Data Quality Control.
Vijaya Raghavan Rangamaran ... Kirubagaran Ramalingam
Journal of computational biology : a journal of computational molecular cell biology | VOL. 25
Vijaya Raghavan Rangamaran, et. al.Vijaya Raghavan Rangamaran ... Kirubagaran Ramalingam
08 Sep 2018
Journal of computational biology : a journal of computational molecular cell biology | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SMusket: Spark-based DNA error correction on distributed-memory systems

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems