Abstract

Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI’s performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.

Highlights

  • Large multilocus data sets are becoming ever more common in systematic biology (Cranston et al 2009; Song et al 2012; Song et al 2012; Salichos and Rokas 2013; DeGiorgio et al 2014; Fontaine et al 2015; Peters et al 2017)

  • We show via simulations in the three- and four-taxon settings that TASTI outperforms competing methods MP-EST (Liu, Yu, and Edwards 2010), STELLS2 (Pei and Wu 2017), and STEM2.0 (Kubatko et al 2009, hereafter referred to as STEM) when population structure is present, but remains competitive under the unstructured multispecies coalescent

  • MPEST is a pseudo-likelihood approach based on triples of taxa, whereas STELLS2 and STEM are likelihood approaches based on gene tree topologies and gene tree topologies with branch lengths, respectively. We selected these methods for comparison because they are state-of-the-art methods for estimating species trees from gene trees which 1) operate within the maximum likelihood paradigm but 2) do not assume any sort of structure or other inter- or intraspecies gene flow

Read more

Summary

Introduction

Large multilocus data sets are becoming ever more common in systematic biology (Cranston et al 2009; Song et al 2012; Song et al 2012; Salichos and Rokas 2013; DeGiorgio et al 2014; Fontaine et al 2015; Peters et al 2017). The multispecies coalescent assumes that each modern and ancestral species is unstructured and has a constant population size and that each pair of lineages within a given ancestral species has an equal probability of coalescing (Nakhleh 2013). Under these assumptions, incomplete lineage sorting leads to symmetries in gene tree distributions for any species tree, regardless of the number of taxa. Asymmetries in gene tree distributions are often attributed to gene flow between species (McGuire et al 2007; Escobar et al 2012; Marcussen et al 2014)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call