Abstract
Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.
Highlights
Adjustment and Its Automation.High-quality multiple sequence alignment (MSA) is crucially important in sequence comparison and molecular phylogenetics because a poor alignment typically leads to bias and inaccuracy in phylogenetic estimation [1,2,3]
This is especially true in the present day, where the availability of an increasing number of sequences of increasing sequence lengths is often associated with the application of quick-and-dirty options in sequence alignment programs
Such post-alignment improvements gradually faded away because of three contributing factors. It becomes less important with the emergence of more accurate MSA programs such as MUSCLE [6] and MAFFT [7] with multiple iterations of MSA refinement [7,8,9]
Summary
High-quality multiple sequence alignment (MSA) is crucially important in sequence comparison and molecular phylogenetics because a poor alignment typically leads to bias and inaccuracy in phylogenetic estimation [1,2,3]. This is especially true in the present day, where the availability of an increasing number of sequences of increasing sequence lengths is often associated with the application of quick-and-dirty options in sequence alignment programs. MSA was traditionally followed by post-alignment visual inspection and manual adjustment Such post-alignment improvements gradually faded away because of three contributing factors. Postalignment adjustment can be error prone and inconsistent because there is no explicit and consistent criterion that is universally used by researchers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.