Abstract

This paper presents a novel genetic algorithm (GA) for multiple sequence alignment in protein analysis. The most significant improvement afforded by this algorithm results from its use of segment profiles to generate the diversified initial population and prevent the destruction of conserved regions by crossover and mutation operations. Segment profiles contain rich local information, thereby speeding up convergence. Secondly, it introduces the use of the norMD function in a genetic algorithm to measure multiple alignment Finally, as an approach to the premature problem, an improved progressive method is used to optimize the highest-scoring individual of each new generation. The new algorithm is compared with the ClustalX and T-Coffee programs on several data cases from the BAliBASE benchmark alignment database. The experimental results show that it can yield better performance on data sets with long sequences, regardless of similarity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call