Abstract

The rise of high-throughput sequencing techniques provides the unprecedented opportunity to analyse controversial phylogenetic relationships in great depth, but also introduces a risk of being misinterpreted by high node support values influenced by unevenly distributed missing data or unrealistic model assumptions. Here, we use three largely independent phylogenomic data sets to reconstruct the controversial phylogeny of true salamanders of the genus Salamandra, a group of amphibians providing an intriguing model to study the evolution of aposematism and viviparity. For all six species of the genus Salamandra, and two outgroup species from its sister genus Lyciasalamandra, we used RNA sequencing (RNAseq) and restriction site associated DNA sequencing (RADseq) to obtain data for: (1) 3070 nuclear protein-coding genes from RNAseq; (2) 7440 loci obtained by RADseq; and (3) full mitochondrial genomes. The RNAseq and RADseq data sets retrieved fully congruent topologies when each of them was analyzed in a concatenation approach, with high support for: (1) S. infraimmaculata being sister group to all other Salamandra species; (2) S. algira being sister to S. salamandra; (3) these two species being the sister group to a clade containing S. atra, S. corsica and S. lanzai; and (4) the alpine species S. atra and S. lanzai being sister taxa. The phylogeny inferred from the mitochondrial genome sequences differed from these results, most notably by strongly supporting a clade containing S. atra and S. corsica as sister taxa. A different placement of S. corsica was also retrieved when analysing the RNAseq and RADseq data under species tree approaches. Closer examination of gene trees derived from RNAseq revealed that only a low number of them supported each of the alternative placements of S. atra. Furthermore, gene jackknife support for the S. atra - S. lanzai node stabilized only with very large concatenated data sets. The phylogeny of true salamanders thus provides a compelling example of how classical node support metrics such as bootstrap and Bayesian posterior probability can provide high confidence values in a phylogenomic topology even if the phylogenetic signal for some nodes is spurious, highlighting the importance of complementary approaches such as gene jackknifing. Yet, the general congruence among the topologies recovered from the RNAseq and RADseq data sets increases our confidence in the results, and validates the use of phylotranscriptomic approaches for reconstructing shallow relationships among closely related taxa. We hypothesize that the evolution of Salamandra has been characterized by episodes of introgressive hybridization, which would explain the difficulties of fully reconstructing their evolutionary relationships.

Highlights

  • The rise of high-throughput sequencing techniques has provided molecular systematists with unprecedented opportunity to analyse controversial phylogenetic relationships in great depth

  • Phylogenomic approaches based on single nucleotide polymorphisms (SNPs) have been applied to inferences of population-level differentiation, phylogeography, and phylogenetic relationships among closely related species (Davey and Blaxter, 2011; Rubin et al, 2012; Peterson et al, 2012; Darwell et al, 2016), whereas those based on sequences of protein-coding genes derived from RNAseq or full genomes have been used for inferring deep nodes in the tree of life, often analyzed at the amino acid level (Bapteste et al, 2002; Chiari et al, 2012; Wickett et al, 2014; Jarvis et al, 2014; Chen et al, 2015; Irisarri and Meyer, 2016)

  • As the performance of de novo assembled restriction site associated DNA sequencing (RADseq) data matrices in phylogenetic reconstructions depends on the sample coverage and potential intra-locus paralogy (Huang and Knowles, 2014; Takahashi et al, 2014), we explored a range of thresholds for loci coverage between samples (4, 6, 8 10 and 11 individuals; equivalent to 31–100% of the in-group) and maximum number of SNPs per RAD locus (2, 4, 6, 8 and 10)

Read more

Summary

Introduction

The rise of high-throughput sequencing techniques has provided molecular systematists with unprecedented opportunity to analyse controversial phylogenetic relationships in great depth (da Fonseca et al, 2016). Concatenating across genes in an RNAseq analysis precludes gene-specific estimation of transition matrices or accurate estimation of rate heterogeneity, because current software to estimate partition schemes (e.g., Lanfear et al, 2012) is still in its infancy when it comes to efficiently handling thousands of genes It is insufficiently known how sensitive phylogenomic resolution is to combining different types of data in the same analyses. Direct empirical comparisons between gene-based and SNP-based data obtained from the same set of samples, necessary to determine their relative sensitivity to phylogenetic error and to assess their performance in resolving shallow phylogenetic relationships, are scarce

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call