Abstract
This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.
Highlights
The multiple sequence alignment (MSA) consists to align more than two biological sequences like DNA or protein to bring out similar or homologous regions
This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space
In this paper, a new strategy to tackle the MSA problem is developed based on the divide and conquer approach
Summary
The multiple sequence alignment (MSA) consists to align more than two biological sequences like DNA or protein to bring out similar or homologous regions. MSA plays an important task in Bioinformatics and it is widely used like in protein analysis, identification of functional sites in genomic sequences, structural prediction, etc. Finding an optimal MSA has been demonstrated NP-hard (Wang & Jiang, 1994). MSA is an optimization problem, which exhibits a high time and space complexity. To solve this problem, several methods were proposed. They can be categorized into three classes (Notredame, 2002): exact methods, progressive methods and iterative methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Cognitive Informatics and Natural Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.