A tensorial approach to the inversion of group-based phylogenetic models.

Jeremy G Sumner,Barbara R Holland,Peter D Jarvis

doi:10.1186/s12862-014-0236-6

Abstract

BackgroundHadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called “edge length” and “sequence” spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling.ResultsFor general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only.ConclusionWe provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.

Highlights

Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods
Considering Kimura’s neutral theory of molecular evolution, it is logical to apply a stochastic model at the level of DNA substitutions to construct probabilistic description of what molecular alignments are expected to be observed, given a proposed evolutionary history. is commonly implemented assuming an IID and Markov process for DNA substitution, leading to a model that has a continuous-time Markov chain at its core
In a series of papers, Hendy and colleagues introduced the Hadamard conjugation as a novel tool for phylogenetic analyses [3,4,5]. They found an invertible relationship between a phylogenetic tree, as characterized by its edge length spectrum, and the probability distribution of site patterns

Summary

Introduction

Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. In a series of papers, Hendy and colleagues introduced the Hadamard conjugation as a novel tool for phylogenetic analyses [3,4,5] They found an invertible relationship between a phylogenetic tree, as characterized by its edge length spectrum, and the probability distribution of site patterns (referred to as the sequence spectrum). Hadamard conjugation has been used as both a tool for simulation [10] and to look at statistical properties of methods, exploring the inconsistency of parsimony under a molecular clock [5,11] For these sorts of applications, following the notation in Felsenstein [1], we can use the Hadamard transform H to start with an edge length spectrum γ and calculate the sequence spectrum s = H−1 log(Hγ). It is not expected that the γspectrum will precisely match a tree, Hendy [12] proposed using an optimisation criterion to map from γto the “closest tree”

Methods

Results

Conclusion