Abstract

This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.

Highlights

  • The multiple sequence alignment (MSA) consists to align more than two biological sequences like DNA or protein to bring out similar or homologous regions

  • This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space

  • In this paper, a new strategy to tackle the MSA problem is developed based on the divide and conquer approach

Read more

Summary

Introduction

The multiple sequence alignment (MSA) consists to align more than two biological sequences like DNA or protein to bring out similar or homologous regions. MSA plays an important task in Bioinformatics and it is widely used like in protein analysis, identification of functional sites in genomic sequences, structural prediction, etc. Finding an optimal MSA has been demonstrated NP-hard (Wang & Jiang, 1994). MSA is an optimization problem, which exhibits a high time and space complexity. To solve this problem, several methods were proposed. They can be categorized into three classes (Notredame, 2002): exact methods, progressive methods and iterative methods

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call