Abstract

A classical algorithm for the pairwise sequence alignment is the Smith Waterman algorithm which uses dynamic programming. The algorithm computes the maximum score of alignments that use insertions, deletions, and substitutions, with no consideration given in composition of the alignments. However, biologists favor applying their knowledge about common structures or functions into the alignment process. For alignment of protein sequences, several methods have been suggested for taking into account the motifs (a restricted regular expression) from the PROSITE database to guide alignments. One method modifies the Smith Waterman dynamic programming solution to reward alignments that contain matching motifs. Another method introduces the regular expression constrained sequence alignment problem in which pairwise alignments are constrained to contain a given regular expression. This latter method constructs a weighted finite automaton from a given regular expression, and presents a dynamic programming solution that simulates copies of this automaton in seeking an alignment with maximum score containing the regular expression. We generalize this approach: 1) We introduce a variation of the problem for multiple sequences, namely the regular expression constrained multiple sequence alignment, and present an algorithm for it; 2) We develop an algorithm for the case of the problem when the alignments sought are required to contain a given sequence of regular expressions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call