Abstract
BackgroundMultiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels.ResultsWe propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee.ConclusionPRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at .
Highlights
Multiple sequence alignment (MSA) is a useful tool in bioinformatics
Because only about 20% of the sequences in BAliBASE version 3.0 [23] used for the test are common to those in BAliBASE version 2.01, we do not think that these parameters are over-fitted against BAliBASE version 3.0
The group-to-group sequence alignment algorithm is the key to most heuristic MSA algorithms
Summary
PRIME We developed a program called PRIME (Profile-based Randomized Iteration Method). The results indicate that PRIMEpiecewise is less affected by such regions than PRIMEaffine This follows the general tendency that terminal gaps reduce more significantly the accuracy of global alignment programs including Prrn, MUSCLE, POA, and ClustalW than that of MAFFT, ProbCons, and T-Coffee that incorporate local alignment information in some ways. The horizontal axis denotes reference alignment ID, and the vertical axis, the difference in sum-of-pairs or column scores on respective alignments of the full length set using PRIMEpiecewise and PRIMEaffine. The horizontal axis denotes reference alignment ID, and the vertical axis, the difference in sum-of-pairs or column scores on respective alignments of the homologous region set using PRIMEpiecewise and PRIMEaffine. Overall and Ranksum columns show the average sum-of-pairs scores and the rank sum of the Friedman test using all alignment of the whole homologous region set, respectively. The computational speed would be significantly improved by incorporating anchoring heuristics and refining source codes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.