Abstract

Local multiple sequence alignment is a basic tool for extracting functionally important regions shared by a family of protein sequences. We present an effectively polynomial-time algorithm for rigorously solving the local multiple alignment problem. The algorithm is based on the dead-end elimination procedure that makes it possible to avoid an exhaustive search. In the framework of the sum-of-pairs scoring system, certain rejection criteria are derived in order to eliminate those sequence segments and segment pairs that can be mathematically shown to be inconsistent (dead-ending) with the globally optimal alignment. Iterative application of the elimination criteria results in a rapid reduction of combinatorial possibilities without considering them explicitly. In the vast majority of cases, the procedure converges to a unique globally optimal solution. In contrast to the exhaustive search, whose computational complexity is combinatorial, the algorithm is computationally feasible because the number of operations required to eliminate the dead-ending segments and segment pairs grows quadratically and cubically, respectively, with the total number of sequence elements. The method is illustrated on a set of protein families for which the globally optimal alignments are well recognized. The source code of the program implementing the algorithm is available upon request from the authors. alex_lukashin@biogen.com.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call