A sliding window and keyword tree based algorithm for multiple sequence alignment

Jun Wang,Yong Sun

doi:10.1109/fskd.2012.6233880

Abstract

Multiple sequence alignment (MSA) is an important issue in genetic sequence analysis. The increasing volume of genome data requires tools that can quickly and accurately compare and align them. The most important step of MSA is the reference sequence determination. Current alignment methods usually need a huge time to find the reference sequence in long sequences and the accuracy of the determining sequence still need to improve. In this paper, a sliding window and the keyword tree based algorithm is employed to match the substring set of the sequence data and find the reference sequence with the greatest probability. The novel method can accurately find the center sequence and the complete matching regions. Using these regions, our algorithm can align the multiple sequences based on an improved center star method. Following the change of the advanced step value of the slide window, both the running time and the accuracy of our aligning method will change. Experimental results indicate that the improved method is faster and more accurate than others.

Full Text