Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

Jorge González-Domínguez,Yongchao Liu,Bertil Schmidt

doi:10.1371/journal.pone.0145490

Jorge González-Domínguez, Yongchao Liu + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0145490

Copy DOI

Abstract

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).

Highlights

The application of next-generation sequencing (NGS) technologies has led to an explosion of short-read sequence datasets
MerAligner [29] is a parallel Unified Parallel C (UPC) short-read aligner for distributed-memory architectures which obtains good scalability on multi-core clusters. merAligner optimizes the distribution of the reference genome index in case that it is too large to fit in one node
We have analyzed the scalability of our UPC++ implementation by aligning four Illumina short-read datasets to the human genome hg38

Summary

Introduction

The application of next-generation sequencing (NGS) technologies has led to an explosion of short-read sequence datasets. The alignment of produced sequences to a given reference genome, i.e. short-read alignment (SRA), is one of the most important basic operations required for further downstream analysis. All of them are based on the seed-and-extend approach, using different seeding policies This approach maps a given read by first identifying seeds on the genome using efficient indexing data structures. MerAligner [29] is a parallel UPC short-read aligner for distributed-memory architectures which obtains good scalability on multi-core clusters. The execution model of UPC++ is single program multiple data (SPMD) As this language is able to work on both shared-memory and distributed-memory systems, each independent execution unit (UPC++ process) can be implemented as an OS process or a POSIX thread (Pthread). UPC++ takes advantage of C++ language features, such as templates, objectoriented design, operator overloading, and lambda functions (in C++ 11) to provide advanced PGAS features

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jan 5, 2016
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Accelerating Alignment for Short Reads Allowing Insertion of Gaps on Multi-Core Cluster
Yongjie Yang ... Danyang Chen
-
Yongjie Yang, et. al.Yongjie Yang ... Danyang Chen
01 Dec 2019
01 Dec 2019

Enhancing miRNA annotation confidence in miRBase by continuous cross dataset analysis
Thomas B Hansen ... Jesper B Bramsen
RNA Biology | VOL. 8
Thomas B Hansen, et. al.Thomas B Hansen ... Jesper B Bramsen
01 May 2011
RNA Biology | VOL. 8

Performance Optimization of a Parallel Error Correction Tool
Marco Martínez-Sánchez ... Roberto R Expósito
-
Marco Martínez-Sánchez, et. al.Marco Martínez-Sánchez ... Roberto R Expósito
15 Oct 2021
15 Oct 2021

Leveraging The Old With The New: Exploring and Integrating Historic Microarray Studies With Next Generation Sequencing For Multiple Myeloma
Michael A Bauer ... Donald Johann
Blood | VOL. 122
Michael A Bauer, et. al.Michael A Bauer ... Donald Johann
15 Nov 2013
Blood | VOL. 122

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE