Abstract

Bioinformatics and molecular evolutionary analyses most often start with comparing DNA or amino acid sequences by aligning them. Pairwise alignment, for example, is used to measure the similarities between a query sequence and each of those in a database in BLAST similarity search, the most used bioinformatics tool (Altschul et al., 1990; Camacho et al., 2009). Evolutionary history among sequences can be reflected better when more than two sequences are aligned, in a multiple sequence alignment (MSA). When building an MSA, we assume that the sequences compared are derived from a common ancestral sequence. Then the process of MSA building is to infer homologous positions between the input sequences and place gaps in the sequences in order to align these homologous positions. These gaps represent evolutionary events of their own. Gaps (also called indels) are caused by either insertions or deletions of characters (nucleotides or amino acids) on a particular lineage of sequences during the evolution. Building an MSA is, therefore, to reconstruct the evolutionary history of the sequences involved. While it is easy to understand that the quality of MSAs affects the quality of phylogenetic tree reconstruction, the effect of MSA quality reaches far beyond this. Some examples of bioinformatics methods that utilize information extracted from MSAs include: profile building in similarity search (e.g., PSIBLAST: Altschul et al., 1997), motif/profile recognition (e.g., PROSITE: Hulo et al., 2008), profile hidden Markov models for protein families/domains (e.g., Pfam: Finn et al., 2010), and protein secondary-structure prediction (for review, see Pirovano & Heringa, 2010). There are numerous bioinformatics and molecular evolutionary analyses that are affected by MSA quality and they can be benefited by having reliable MSAs. Despite the significance of having good MSAs, assessing MSA quality is far from straightforward. Measuring the quality of MSAs requires two components: a benchmark dataset and a scoring method. A benchmark dataset includes reference alignments. These alignments are considered to represent the evolutionary history of the sequences truthfully. The same set of sequences included in a reference alignment is then aligned using the MSA methods to be tested. The reconstructed MSA can be compared with the reference MSA using a scoring method and the quality of the reconstructed MSA is assessed compared to the

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call