Abstract

The multiple alignment of the sequences of DNA and proteins is applicable to various important fields in molecular biology. Although the approach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On the other hand, this problem corresponds to the shortest path problem and the A ∗ algorithm, which can efficiently find the shortest path with an estimator, is usable. First, this paper directly applies the A ∗ algorithm to multiple sequence alignment problem with more powerful estimator in more than two-dimensional case and discusses the extensions of this approach utilizing an upper bound of the shortest path length and of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these results was originally shown in Ikeda and Imai [11]. This part is similar to the branch-and-bound techniques implemented in MSA program in Gupta et al. [6]. Our framework is based on the edge length transformation to reduce the problem to the shortest path problem, which is more suitable to generalizations to enumerating suboptimal alignments and parametric analysis as done in Shibuya and Imai [15–17]. By this enhanced A ∗ algorithm, optimal multiple alignments of several long sequences can be computed in practice, which is shown by computational results. Second, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say multiple alignments of 50–100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterative improvement methods for multiple alignment, the so-called group-to-group two-dimensional dynamic programming has been used, and in this respect our proposal is to extend the ordinary two-group dynamic programming to a k-group alignment programming. This extension is conceptually straightforward, and here our contribution is to demonstrate that the k-group alignment can be implemented so as to run in a reasonable time and space under standard computing environments. This is established by generalizing the above A ∗ search approach. The k-group alignment method can be directly incorporated in existing methods such as iterative improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k-group method to iterative improvement algorithms, and shows that our approach can find better alignments in reasonable time. For example, through larger-scale computational experiments here, 34 protein sequences with very high homology can be optimally 10-group aligned, and 64 sequences with high homology can be optimally 5-group aligned.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call