Abstract

BackgroundMultiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program’s algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution.ResultsOur results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program.ConclusionsBased on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly.

Highlights

  • Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose

  • Alignment accuracy: sum-of-pairs and total-column scores For Reference datasets 1 to 5, the accuracy of the alignments produced by Probcons, T-Coffee, Probalign, and MAFFT were consistently higher than that of the other programs (Figure 1)

  • In these five reference test cases, all four programs had Z-scores above the average, being in some cases statistically superior when compared to MUSCLE, CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX and Partial Order Alignment algorithm (POA) (See Additional file 1)

Read more

Summary

Introduction

Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. A balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. Dynamic programming guarantees a mathematically optimal alignment of sequences, heuristicsbased algorithms are preferred as they require less computational capacity, suitable in studies involving multiple sequences. The vast majority of heuristicsbased MSA programs align sequences using the progressive approach, combining global and/or local methods [3]. This type of algorithm builds a MSA through a series of consecutive pairwise alignments, following the branching order of a guide tree. One of the first MSA programs combining progressive and global pairwise alignment is CLUSTALW [4]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call