Abstract

Multiple sequence alignment (MSA) is an important issue in genetic sequence analysis. The increasing volume of genome data requires tools that can quickly and accurately compare and align them. The most important step of MSA is the reference sequence determination. Current alignment methods usually need a huge time to find the reference sequence in long sequences and the accuracy of the determining sequence still need to improve. In this paper, a sliding window and the keyword tree based algorithm is employed to match the substring set of the sequence data and find the reference sequence with the greatest probability. The novel method can accurately find the center sequence and the complete matching regions. Using these regions, our algorithm can align the multiple sequences based on an improved center star method. Following the change of the advanced step value of the slide window, both the running time and the accuracy of our aligning method will change. Experimental results indicate that the improved method is faster and more accurate than others.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.