Abstract
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O ( | Σ | 2 | V | 4 n 2 ) and O ( | Σ | 2 | V | 4 n ) , respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O ( | V | 3 n 2 ) time and O ( | V | 2 n ) space in the worst case. If | V | = O ( log n ) we propose another algorithm with time complexity O ( | V | 2 log | V | n 2 ) . The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of | Σ | 2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.