Abstract
BackgroundTo effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted.ResultsHere, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other.ConclusionPhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.
Highlights
To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise
Any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern
The strategy we adopted in the development of PhyloPattern, was first to understand how a biologist reads and uses a phylogenetic tree and, from this, we deduced three main functionalities that are essential for most tree analyses: tree annotation, pattern matching and trees comparison
Summary
The strategy we adopted in the development of PhyloPattern, was first to understand how a biologist reads and uses a phylogenetic tree and, from this, we deduced three main functionalities that are essential for most tree analyses: tree annotation, pattern matching and trees comparison. An evolutionary strategy might be to construct phylogenetic trees for the two protein domains (based on a multiple sequence alignment) http://www.biomedcentral.com/1471-2105/10/298 and to perform the following steps with the PhyloPattern API: Step 1 annotate the leaves of the trees with the domain architectures of associated proteins (using a protein domain database) and annotate internal nodes of the trees by inferring their domain architectures from the leaf domain architectures (using for example the Dollo parsimony algorithm [18] or Maximum Likelihood methods [19]), Step 2 define a pattern with constraints mainly based on the domain architecture tag to try to find a parent node of a shuffling event and apply it to each tree (see pattern schema at the top of Figure 3), Step 3 if such a node is found, annotate each tree, by adding event tags to derived nodes found under the event's parent node, Step 4 apply two patterns to each tree; first to extract a common leaf (same name) from each "event marked" subtree and second to extract an "ancestral" leaf (with the "parent" domain architecture). The subtrees with sequences: [ENSP00000312158, ENSPTR00000056995, ENSMMUP00000028928] match and the other subtrees do not
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.