Abstract

Ultraconserved elements (UCEs) are stretches of hundreds of nucleotides with highly conserved cores flanked by variable regions. Although the selective forces responsible for the preservation of UCEs are unknown, they are nonetheless believed to contain phylogenetically meaningful information from deep to shallow divergence events. Phylogenetic applications of UCEs assume the same degree of rate heterogeneity applies across the entire locus, including variable flanking regions. We present a Wright–Fisher model of selection on nucleotides (SelON) which includes the effects of mutation, drift, and spatially varying, stabilizing selection for an optimal nucleotide sequence. The SelON model assumes the strength of stabilizing selection follows a position-dependent Gaussian function whose exact shape can vary between UCEs. We evaluate SelON by comparing its performance to a simpler and spatially invariant GTR+ model using an empirical data set of 400 vertebrate UCEs used to determine the phylogenetic position of turtles. We observe much improvement in model fit of SelON over the GTR+ model, and support for turtles as sister to lepidosaurs. Overall, the UCE-specific parameters SelON estimates provide a compact way of quantifying the strength and variation in selection within and across UCEs. SelON can also be extended to include more realistic mapping functions between sequence and stabilizing selection as well as allow for greater levels of rate heterogeneity. By more explicitly modeling the nature of selection on UCEs, SelON and similar approaches can be used to better understand the biological mechanisms responsible for their preservation across highly divergent taxa and long evolutionary time scales.

Highlights

  • High-throughput DNA sequencing has transformed phylogenetics from individual targeting of a subset of genes to genome-scale approaches for the simultaneous sequence capture of entire genomes

  • Simulations were performed to assess the difficulties in estimating the model parameters contained within our selection model

  • These simulations were designed to determine the behavior of standard models of nucleotide substitution, like the general time-reversible (GTR) model with Gammadistributed (+Γ) rate variation, when sequences are generated under selection on nucleotides (SelON)

Read more

Summary

Introduction

High-throughput DNA sequencing has transformed phylogenetics from individual targeting of a subset of genes to genome-scale approaches for the simultaneous sequence capture of entire genomes. There are important technical challenges when analyzing genomes across a set of species This has led to the development of reduced representation approaches such as restriction-site associated DNA markers sequencing (e.g., Miller et al 2007), or targeted capture of entire organellar genomes (e.g., Cronn et al 2008), select protein-coding genes (e.g., exome capture; Hodges et al 2007), and ultraconserved genomic elements (i.e., UCEs; Bejerano et al 2004) that target only the informative portions of the genome. The end result is still an extraordinary wealth of data, which holds great promise for resolving the tree of life With these large phylogenomic datasets comes opportunities for gaining new and important insights about the evolutionary processes occurring within the genome. While there have been advances such as automated pipelines for identifying partitioning schemes (Tagliacollo and Lanfear 2018), our goal is to contribute to developing a more mechanistic understanding of UCE and their evolution, in addition to their utility in phylogenetic inference, by explicitly modeling the spatial variation in selection hypothesized to be responsible for UCEs

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call