Abstract

BackgroundAccurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis.ResultsWe provide an extensive comparison of the performance of the alignment refinement algorithms. The accuracy and efficiency of the refinement programs are compared using the 3D structure-based alignments in the BAliBASE benchmark database as well as manually curated high quality alignments from Conserved Domain Database (CDD).ConclusionComparison of performance for refined alignments revealed that despite the absence of dramatic improvements, our refinement method, REFINER, which uses conserved regions as constraints performs better in improving the alignments generated by different alignment algorithms. In most cases REFINER produces a higher-scoring, modestly improved alignment that does not deteriorate the well-conserved regions of the original alignment.

Highlights

  • Accurate multiple sequence alignments of proteins are very important in computational biology today

  • A number of alignment programs apply this strategy [1,2,3] by constructing a global alignment over the entire length of the sequences; they differ mainly in the procedure employed to determine the order of the sequences to be aligned

  • Improvement of alignment Alignments generated by ClustalW version 1.83 [18], Muscle version 3.52 [19], Dialign version 2.3 [20], FFTNSI from the Mafft package version 5.743 [5,21], ProbCons version 1.09 [22] and TCoffee version 3.93 [7], were refined by three different methods: REFINER [15], Remove First (RF) method [13] and RASCAL [14]

Read more

Summary

Introduction

Accurate multiple sequence alignments of proteins are very important in computational biology today. The reliability and accuracy of many bioinformatics methods such as homolog identification, comparative modeling, phylogenetic analysis and others depend heavily on the quality of multiple sequence alignments. Heuristic approaches such as progressive and iterative methods are generally used to obtain multiple sequence alignments in a computationally efficient manner. A multiple alignment is generally built up gradually by aligning the most similar sequences first and successively adding in more distant relatives. Iterative algorithms [4,5] generally attempt to improve the overall quality of alignment by employing an objective function and heuristic measures to obtain an optimal alignment. Alternative approaches that utilize a co-operative strategy to integrate complementary algorithms [6,7] and/or incorporate additional biological data [8,9] have been developed

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.