Testing Empirical Support for Evolutionary Models that Root the Tree of Life

Derek Caetano-Anollés,Kyung Mo Kim,Arshan Nasir,Gustavo Caetano-Anollés

doi:10.1007/s00239-019-09891-7

Abstract

Trees of life (ToLs) can only be rooted with direct methods that seek optimization of character state information in ingroup taxa. This involves optimizing phylogenetic tree, model and data in an exercise of reciprocal illumination. Rooted ToLs have been built from a census of protein structural domains in proteomes using two kinds of models. Fully-reversible models use standard-ordered (additive) characters and Wagner parsimony to generate unrooted trees of proteomes that are then rooted with Weston’s generality criterion. Non-reversible models directly build rooted trees with unordered characters and asymmetric stepmatrices of transformation costs that penalize gain over loss of domains. Here, we test the empirical support for the evolutionary models with character state reconstruction methods using two published proteomic datasets. We show that the reversible models match reconstructed frequencies of character change and are faithful to the distribution of serial homologies in trees. In contrast, the non-reversible models go counter to trends in the data they must explain, attracting organisms with large proteomes to the base of the rooted trees while violating the triangle inequality of distances. This can lead to serious reconstruction inconsistencies that show model inadequacy. Our study highlights the aprioristic perils of disposing of countering evidence in natural history reconstruction.

Highlights

Phylogenetic characters are useful biological features. They carry history when they spread in evolution as they transform from a character state to another
Trees must be reconstructed from useful characters that change at rates appropriate to the evolutionary depth of the recovered trees capturing vertical phylogenetic signatures
They are known as ordered characters or Wagner characters and are widely used in the analysis of serial homologies, especially those describing morphological features of organisms

Summary

Introduction

They carry history when they spread in evolution as they transform from a character state to another. Structural domains show distinct 3-dimensional compact fold structures that are both highly conserved and recurrent in proteomes (Murzin et al 1995). They can be efficiently identified using hidden Markov models of structural recognition (Gough et al 2001). We have shown that domains are excellent phylogenetic characters because they are highly conserved (Nasir and Caetano-Anollés 2012).

Methods

Results

Conclusion