Abstract
Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.
Highlights
The availability of genome-wide data from many species and, in some cases, many individuals per species, has transformed the study of evolutionary histories, and given rise to phylogenomics—the inference of gene and species evolutionary histories from genome-wide data.Consider a data set S = {S1, . . ., Sm} consisting of the molecular sequences of m loci under the assumptions of free recombination between loci and no recombination within a locus
We introduce a novel algorithm for computing the likelihood of phylogenetic networks from bi-allelic genetic markers and use it in a Bayesian inference method
These networks and parameters were inspired by the phylogenetic networks inferred from an empirical genomic data set in [21, 22]
Summary
. ., Sm} consisting of the molecular sequences of m loci under the assumptions of free recombination between loci and no recombination within a locus. The likelihood of a species phylogeny C (topology and parameters) is given by Ym Ym Z. The term p(Si|g) is the likelihood of gene tree g given the sequence data of locus i [1]. The term p(g|C) is the density function (pdf) of gene trees given the species phylogeny and its parameters. Rannala and Yang [2] derived this pdf under the multispecies coalescent (MSC). This formulation underlies the Bayesian inference methods of [2,3,4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.