On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

Douglas L Theobald

doi:10.1186/1745-6150-6-60

Abstract

BackgroundThe universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set.ResultsFor the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data.ConclusionsFor K&W's artificial protein data, sequence similarity is the predominant factor influencing the preference for common ancestry. In contrast, for the real proteins, model selection tests show that phylogenetic structure is much more important than sequence similarity. Hence, the model selection tests demonstrate that real universally conserved proteins are homologous, a conclusion based primarily on the specific nested patterns of correlations induced in genetically related protein sequences.ReviewersThis article was reviewed by Rob Knight, Robert Beiko (nominated by Peter Gogarten), and Michael Gilchrist.

Highlights

The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence
The purported demonstration of the universal common ancestry is a trivial consequence of significant sequence similarity between the analyzed proteins
The nature and origin of this similarity are irrelevant for the prediction of “common ancestry” of by the model comparison approach. Later they further explain that the model selection results in favour of UCA are “ a restatement of the fact that these proteins display a highly statistically significant sequence similarity”

Summary

Introduction

The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. One of the notable strengths of this study is that it provides evidence for common ancestry without recourse to the common assumption that a high degree of sequence similarity necessarily implies homology This UCA study was subsequently criticized in a paper by Koonin and Wolf (hereafter referred to as K&W), in which they argue that the results in favour of UCA are “a trivial consequence of significant sequence similarity between the analyzed proteins” and that my tests “yield results ‘in support of common ancestry’ for any sufficiently similar sequences” [2].

Objectives

Methods

Results

Conclusion