Abstract
So much research builds on evolutionary histories of species and genes. They are used in genomics to infer synteny, in ecology to describe and predict biodiversity, and in molecular biology to transfer knowledge acquired in model organisms to humans and crops. Beyond downstream applications, expanding our knowledge of life on Earth is important in its own right. From Naturalis Historia to On the Origin of Species, the acquisition of this knowledge has been a part of human development. Evolutionary histories are commonly represented as trees, where a common ancestor progressively splits into descendant species or alleles. Time trees add more information by using height to represent genetic distance or elapsed time. Species and gene trees can be inferred from molecular sequences using methods which are explicitly model-based, or implicitly assume or are statistically consistent with a particular model of evolution. One such model, the multispecies coalescent (MSC), is the topic of my thesis. Under this model, separate trees are inferred for the species history and for each gene’s history. Gene trees are embedded within the species tree according to a coalescent process. Researchers often avoid the MSC when reconstructing time trees because of claims that available implementations are too computationally demanding. Instead, the species history is inferred using a single tree by concatenating the sequences from each gene. I began my thesis research by evaluating the effect of this approximation. In a realistic simulation based on parameters inferred from empirical data, concatenation was grossly inaccurate, especially when estimating recent species divergence times. In a later simulation study I demonstrated that when using concatenation, credible intervals often excluded the true values. To address reluctance towards using the MSC, I developed a faster implementation of the model. StarBEAST2 is a Markov chain Monte Carlo (MCMC) method, meaning it characterizes the probability distribution over trees by randomly walking the parameter space. I improved computational performance by developing more efficient proposals used to traverse the space, and reducing the number of parameters in the model through analytical integration of population sizes. Despite its sophistication, the MSC has theoretical limitations. One is that the substitution rate is assumed to stay constant, or uncorrelated between lineages of different genes. However substitution rates do vary and are associated with species traits like body size. I addressed this assumption in StarBEAST2 by extending the MSC to estimate substitution rates for each species. Another assumption is that genetic material cannot be transferred horizontally, but a more general model called the multispecies network coalescent…
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.