Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements.

Tony Gamble,Todd R Jackman,Benjamin R Karin

doi:10.1093/molbev/msz263

Abstract

Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.

Highlights

Though large phylogenomic data sets have become relatively easy to obtain in recent years and have led to many highly resolved phylogenetic estimates, it has become clear that the sheer quantity of sequence data that can be gathered will not unambiguously resolve some of the most difficult nodes in the tree of life
The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs, Faircloth et al 2012) and Anchored Hybrid Enrichment (AHE, Lemmon et al 2012), both of which were developed to target the variable flanking regions surrounding highly conserved anchor points, and we describe these in more detail below
Since its inception (Lemmon et al 2012), AHE has shifted from this “anchor” method toward tiling probes across a substantially longer target region for a reduced number of loci (Prum et al 2015; Ruane et al 2015), highlighting the advances in sequence capture technology allowing for hybridization to highly diverged sequences (e.g., Li et al 2013). Both the UCE and AHE data sets have often been able to resolve previously difficult nodes (e.g., Crawford et al 2012, 2015; Prum et al 2015; Bryson et al 2016; Streicher and Wiens 2017), though short length and/or or slow evolutionary rate may make both methods susceptible to gene tree estimation error (GTEE) as we show in this study

Summary

Introduction

Though large phylogenomic data sets have become relatively easy to obtain in recent years and have led to many highly resolved phylogenetic estimates, it has become clear that the sheer quantity of sequence data that can be gathered will not unambiguously resolve some of the most difficult nodes in the tree of life These difficulties may be caused by a number of factors including systematic error from nonphylogenetic signal or model inadequacy (Hahn and Nakhleh 2016; Reddy et al 2017), gene tree estimation error from insufficient phylogenetic signal (Blom et al 2017), or from natural processes such as incomplete lineage sorting and introgression (Maddison 1997; Edwards 2009) and positive selection (Castoe et al 2009).

Methods

Results

Conclusion