Abstract
BackgroundMultiple sequence alignment (MSA) is one of the most important research contents in bioinformatics. A number of MSA programs have emerged. The accuracy of MSA programs highly depends on the parameters setting, mainly including gap open penalties (GOP), gap extension penalties (GEP) and substitution matrix (SM). This research tries to obtain the optimal GOP, GEP and SM rather than MAFFT default parameters.ResultsThe paper discusses the MAFFT program benchmarked on BAliBASE3.0 database, and the optimal parameters of MAFFT program are obtained, which are better than the default parameters of CLUSTALW and MAFFT program.ConclusionsThe optimal parameters can improve the results of multiple sequence alignment, which is feasible and efficient.
Highlights
Multiple sequence alignment (MSA) is one of the most important research contents in bioinformatics
MAFFT (MAFFT-7.220-WIN64 version) offers various multiple alignment strategies. They are classified into three types, (a) the progressive method, (b) the iterative refinement method with the weighted sum-of-pairs score (WSP) score, and (c) the iterative refinement method using both the WSP and consistency scores
The sum-of-pair score (SPS) is calculated such that the score increases with the number of sequences correctly aligned (Thompson et al 1999)
Summary
Multiple sequence alignment (MSA) is one of the most important research contents in bioinformatics. The accuracy of MSA programs highly depends on the parameters setting, mainly including gap open penalties (GOP), gap extension penalties (GEP) and substitution matrix (SM). This research tries to obtain the optimal GOP, GEP and SM rather than MAFFT default parameters. Many scholars have developed open source online alignment tools, such as CLUSTALW, T-COFFEE, MAFFT, (Thompson et al 1994; Notredame et al 2000; Katoh et al 2002; Katoh and Toh 2008) and so on. Using these tools, the results of MSA can be quickly obtained, so the tools are mainly used in MSA. Residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have