Abstract

Multiple sequence alignment (MSA) algorithms are used to infer homologous regions in DNA and protein sequences which provide the basis for many microbiological studies. Center star method is an MSA algorithm with the ability to address a large-scale dataset, but it tends to produce poor results in the presence of multiple centers in the set of sequences. In such cases, partially conserved regions are often hidden in the alignment. We introduce an algorithm to address this problem based on Center star and progressive methods for MSA. In this algorithm, we first identify the subsets of sequences within the sequences by applying the Bisecting – kmeans algorithm using K-mers as the attributes for clustering. The center star method is performed separately on each subset of sequences. Finally, we merge these alignments by following a progressive alignment approach. An evaluation is carried out by using a set of DNA sequences from some HIV-1 infected patients with a known transmission chain. According to its results, the new algorithm produces output with better sum of pairs scores compared to center star methods and more accurate phylogeny could be generated using the resulting final alignment compared to the center star and progressive methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.