A Parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

Sriram P Chockalingam,Sharma V Thankachan,Srinivas Aluru

doi:10.1109/sc.2016.66

Abstract

We present an efficient parallel algorithm for the following problem: Given an input collection D of n sequences of total length N, a length threshold f and a mismatch threshold κ, report all κ-mismatch maximal common substrings of length at least f over all pairs of strings in D. This problem is motivated by clustering and assembly applications in computational biology, where D is a collection of millions of short DNA sequences. Sequencing errors and massive size of these datasets necessitate efficient parallel approximate sequence matching algorithms. We present a novel distributed memory parallel algorithm that solves this approximate sequence matching problem in O ((N/p log N + occ)logk N) expected time and takes only O(logk+1 N) expected rounds of global communications, under some realistic assumptions, where p is the number of processors and occ is the output size. To our knowledge, this is the first provably sub-quadratic time algorithm for solving this problem. We demonstrate the performance and scalability of our algorithm using large high throughput sequencing data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A parallel algorithm for finding all pairs k-mismatch maximal common substrings
...
-
, et. al. ...
13 Nov 2016
13 Nov 2016

Sequential and parallel algorithms for all-pair [formula omitted]-mismatch maximal common substrings
Sriram P Chockalingam ... Srinivas Aluru
Journal of Parallel and Distributed Computing | VOL. 144
Sriram P Chockalingam, et. al.Sriram P Chockalingam ... Srinivas Aluru
04 Jun 2020
Journal of Parallel and Distributed Computing | VOL. 144

Fast Approximation of Frequent k-Mers and Applications to Metagenomics.
Leonardo Pellegrina ... Fabio Vandin
Journal of computational biology : a journal of computational molecular cell biology | VOL. 27
Leonardo Pellegrina, et. al.Leonardo Pellegrina ... Fabio Vandin
20 Dec 2019
Journal of computational biology : a journal of computational molecular cell biology | VOL. 27

(Prefix) reversal distance for (signed) strings with few blocks or small alphabets
Laurent Bulteau ... Christian Komusiewicz
Journal of Discrete Algorithms | VOL. 37
Laurent Bulteau, et. al.Laurent Bulteau ... Christian Komusiewicz
01 Mar 2016
Journal of Discrete Algorithms | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

Abstract

Talk to us

Similar Papers