Abstract

The qualification of orthology is a significant challenge when developing large, multiloci phylogenetic data sets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein-coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete phylogenetic data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon capture targeting 490 of the 500 genes (those with at least one exon >120bp) from 22 species of Australian Camaenidae successfully captured sequences of 2825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness when considering the original 500 genes. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and 'Agalma-equivalent' data set (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.

Highlights

  • Robust and well resolved phylogenies document the evolutionary history of organisms and are essential for understanding spatio-temporal patterns of phylogenetic diversification and phenotypic evolution

  • We considered L. gigantea to be sufficiently divergent from the eupulmonates (> 400 million years, Zapata et al 2014) that single-copy status could differ

  • Of the 288 genes used in a previous molluscan phylogenomic study

Read more

Summary

Introduction

Robust and well resolved phylogenies document the evolutionary history of organisms and are essential for understanding spatio-temporal patterns of phylogenetic diversification and phenotypic evolution. Despite the central role of phylogenies in evolutionary biology, most phylogenetic studies in non-model systems have relied on a limited number of readily sequenced genes due to cost restrictions and availability of phylogenetic markers Both theoretical and empirical studies have shown that a greater number of independently evolving loci are needed to resolve difficult phylogenetic questions (Gontcharov et al 2004; Wortley et al 2005; Leaché & Rannala 2011). This need has been addressed by rapid advances in phylogenomics, which capitalise on high-throughput sequencing to acquire large multi-loci datasets. Obtaining such universal sets of orthologous genes allows for consistency and comparison across studies, and contributes towards a more comprehensive Tree of

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call