Abstract

In the previous research work using Genetic Algorithm (GA) to perform multiple sequence alignment [2], we observe that the gaps tend to group towards the rightmost ends of the aligned sequences for datasets with very long sequences, with similarity or with large number of sequences. This becomes problematic when the expected gap-blocks should be inserted in another location of the aligned sequences. To solve the problem, two possible options for refinement of the earlier method can be done: by making improvements and additions of genetic operations to the original program and by improving the initialization process. We opt for the latter, in which rough multiple alignments are adopted as the initial populations, instead of using randomly inserted gaps in the initial population as done in the previous method. Another drawback in the previous method is that, it takes time to perform the necessary calculations. To solve this problem, one of the ways is to divide the query sequences into different sub-groups and to perform multiple alignments to each of these sub-groups. In this case, membrane proteins suit well to the situation since it consists of transmembrane segments (TMS) and loop regions thereby providing a natural means of sequence partition. By dividing these according to each of the regions (TMS and loop) and performing the subsequent alignment respectively, we expect some increase in the calculation speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call