A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes

Fahad Saeed,Alan Perez-Rathke,Jaroslaw Gwarnicki,Tanya Berger-Wolf,Ashfaq Khokhar

doi:10.1016/j.jpdc.2011.08.001

Abstract

Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for a large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such a large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.

Highlights

For over a decade, Sanger sequencing has been the cornerstone of genome sequencing including that of microbial genomes
It is a non-cloning pyrosequencing based platform that is capable of generating 1 million overlapping reads in a single run. Multitude of factors, such as relatively short read lengths as compared to Sanger, lack of a paired end protocol, and limited accuracy of individual reads for repetitive DNA, in the case of monopolymer repeats, present many computational challenges [1] to make pyrosequencing useful for biology and bioinformatics applications
We present a solution to the problem of aligning pyroreads from multiple genomes using a multiple alignment methodology on multiprocessor platforms

Summary

Introduction

Sanger sequencing has been the cornerstone of genome sequencing including that of microbial genomes. One of the most widely employed pre processing step for many applications, including haplotype reconstruction [2] [3], analysis of microbial community analysis [4], analysis of genes for diseases [5], is the alignment of these reads with the wild type For important applications such as viral population estimation or haplotype reconstruction of various viruses e.g., HIV in a population, scientists usually have the information about the wild type genome of the virus. To this date, numerous tools have been suggested for mapping short reads to the single reference genome. These strategies are usually at the cost of simplifying the mapping problem and not allowing complex alignments, including gaps or alignment with multiple reference genomes

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Parallel and Distributed Computing	Publication Date: Sep 16, 2011
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing

Lead the way for us

Similar Papers

Multi-CSAR: a web server for scaffolding contigs using multiple reference genomes.
Shu-Cheng Liu ... Yan-Ru Ju
Nucleic acids research | VOL. 50
Shu-Cheng Liu, et. al.Shu-Cheng Liu ... Yan-Ru Ju
07 May 2022
Nucleic acids research | VOL. 50

Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
Kun-Tze Chen ... Hsin-Ting Shen
BMC Systems Biology | VOL. 12
Kun-Tze Chen, et. al.Kun-Tze Chen ... Hsin-Ting Shen
01 Dec 2018
BMC Systems Biology | VOL. 12

Multi-CAR: a tool of contig scaffolding using multiple references.
Kun-Tze Chen ... Shang-Hao Huang
BMC Bioinformatics | VOL. 17
Kun-Tze Chen, et. al.Kun-Tze Chen ... Shang-Hao Huang
01 Dec 2016
BMC Bioinformatics | VOL. 17

Integration of Alignment and Phylogeny in the Whole-Genome Era

-

18 Jun 2015
18 Jun 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing