Abstract

We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases.

Highlights

  • Recent advances in sequencing technologies have furnished the expansion of genomewide phylogenetic studies, unveiling extensive phylogenomic incongruence (Jeffroy, Brinkmann, et al 2006; Salichos and Rokas 2013) and bringing back to the spotlight the consideration of how ancestral polymorphisms sort within populations (Edwards 2009)

  • SimPhy simulates the evolution of multiple gene families under a hierarchical phylogenomic model in which gene trees evolve inside locus trees, which in turn evolve along a single species tree (Fig. 1)

  • We have previously shown that the most recent common ancestor (MRCA) of a new gene originated by duplication and its paralog does not necessarily coincide with the individual where this duplication first occurred, generating a systematic overestimation of the duplication time for locus-tree unaware reconciliation methods (Mallo, De Oliveira Martins, et al 2014b) (Fig. S10)

Read more

Summary

Introduction

Recent advances in sequencing technologies have furnished the expansion of genomewide phylogenetic studies, unveiling extensive phylogenomic incongruence (Jeffroy, Brinkmann, et al 2006; Salichos and Rokas 2013) and bringing back to the spotlight the consideration of how ancestral polymorphisms sort within populations (Edwards 2009). Only very few tools are able to simulate phylogenies jointly considering multiple sources of phylogenomic incongruence, like PrIME-GenPhyloData (Sjostrand, Arvestad, et al 2013) and DLCoal_sim (Rasmussen and Kellis 2012) The former combines GDL and HGT, while the later considers GDL and ILS. SimPhy implements a flexible hierarchical parameterization scheme that considers genome-wide and gene family specific conditions, including different sources for evolutionary rate variation among lineages. These parameters can be fixed or sampled from statistical distributions defined by the user. Species trees might be equivalent to population trees when the organismal units of interest are conspecific populations

Objectives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call