The Longest Common Subsequence (LCS) problem is a well-known and studied problem in computer science and bioinformatics. It consists in finding the longest subsequence that is common to two or more given sequences. In this paper, we address the problem of finding all LCS for the Sequential Substring Constrained-LCS (SSCLCS) problem, called the Multiple SSCLCS problem. To solve this problem, we first propose a dominant point-based sequential algorithm, designed on a new Leveled Direct Acyclic Graph (DAG) that gives the correct evaluation order of subproblems to avoid redundancy due to overlap. Depending on whether the constraints may overlap or not, it requires O ( S | Σ | K + 1 + r + n | Σ |) and O ( S | Σ | K + 1 + n | Σ |) time with \(O(Max\_level+n|\Sigma |) \) space. S is the number of partial SSCLCS in a node, K is the number of DAG levels, n is the length of sequences, r is the total length of constraints, \(Max\_level \) is the number of nodes in the largest level of the DAG, and | Σ | is the length of the alphabet. Then, we derive a coarse-grained multicomputer parallel solution requiring \(O\left(\frac{S|\Sigma |^{K+1}+r+n|\Sigma |}{p}\right) \) and \(O\left(\frac{S|\Sigma |^{K+1}+n|\Sigma |}{p}\right) \) execution time, \(O(Max\_level+n|\Sigma |) \) memory space and O ( K ) communication rounds. p is the number of processors. Experimental results showed that the parallel algorithm is respectively 14.43 × and 19.19 × faster than the sequential algorithm on 32 and 64 processors.
Read full abstract