Abstract
BackgroundStudying site-specific amino acid frequencies by eye can reveal biologically significant variability and lineage-specific adaptation. This so-called ‘sequence gazing’ often informs bioinformatics and experimental research. But it is important to also account for the underlying phylogeny, since similarities may be due to common descent rather than selection pressure, and because it is important to distinguish between founder effects and convergent evolution. We set out to combine phylogenetic and sequence data to produce evolutionarily insightful visualisations.ResultsWe present ChromaClade, a convenient tool with a graphical user-interface that works in concert with popular tree viewers to produce colour-annotated phylogenies highlighting residues found in each taxon and at each site in a sequence alignment. Colouring branches according to residues found at descendent tips also quickly identifies lineage-specific residues and those internal branches where key substitutions have occurred. We demonstrate applications of ChromaClade to human immunodeficiency virus and influenza A virus datasets, illustrating cases of conservative, adaptive and convergent evolution.ConclusionsWe find this to be a powerful approach for visualising site-wise residue distributions and detecting evolutionary patterns, especially in large datasets. ChromaClade is available for Windows, macOS and Unix or Linux; program executables and source code are available at github.com/chrismonit/chroma_clade.
Highlights
Studying site-specific amino acid frequencies by eye can reveal biologically significant variability and lineage-specific adaptation
Implementation ChromaClade annotates and colours taxon names in phylogenetic trees according to the residues found in the corresponding sequence alignment
For each site in an alignment, ChromaClade annotates taxon names with residue letter codes and a residue-specific hexadecimal red/green/blue colour code that can be recognised by popular tree viewers, such as FigTree [1] or Archaeopteryx [2]
Summary
We present example applications of ChromaClade using human immunodeficiency virus type 1 (HIV-1) and influenza A virus (IAV) datasets where colour-annotating trees highlights differences already known to be biologically significant, illustrating how the approach can be used prospectively. Isolates from the 2009 H1N1 IAV pandemic possessed the avian E627, but interestingly underwent compensatory substitutions at nearby sites which conferred a fitness increase in human cells similar to E627K [9]; again, these substitutions are clearly visible from the annotated trees (Fig. 1e). This illustrates that comparing annotated trees from multiple sites allows the user to spot compensatory substitutions or other potential evolutionary dependence between sites. Studying annotated trees for this site revealed striking convergent evolution, as these substitutions have arisen independently in separate human IAV PB2 lineages (Fig. 1f) This is only apparent if phylogenetic relationships and sequence data are visualised together
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have