Abstract

Phylogenetic research is often stymied by selection of a marker that leads to poor phylogenetic resolution despite considerable cost and effort. Profiles of phylogenetic informativeness provide a quantitative measure for prioritizing gene sampling to resolve branching order in a particular epoch. To evaluate the utility of these profiles, we analyzed phylogenomic data sets from metazoans, fungi, and mammals, thus encompassing diverse time scales and taxonomic groups. We also evaluated the utility of profiles created based on simulated data sets. We found that genes selected via their informativeness dramatically outperformed haphazard sampling of markers. Furthermore, our analyses demonstrate that the original phylogenetic informativeness method can be extended to trees with more than four taxa. Thus, although the method currently predicts phylogenetic signal without specifically accounting for the misleading effects of stochastic noise, it is robust to the effects of homoplasy. The phylogenetic informativeness rankings obtained will allow other researchers to select advantageous genes for future studies within these clades, maximizing return on effort and investment. Genes identified might also yield efficient experimental designs for phylogenetic inference for many sister clades and outgroup taxa that are closely related to the diverse groups of organisms analyzed.

Highlights

  • The genomes of nearly 400 eukaryotes and nearly 3000 prokaryotes are or are in the process of being sequenced

  • Graphical profiles of the phylogenetic informativeness for four loci scaled to match with the ultrametric trees (Figures 1–3) illustrated the great diversity of levels of informativeness among genes in all data sets

  • Townsend [5] phylogenetic informativeness was based on analysis of the canonical four-taxon problem and it does not account for the misleading effects of Empirical performance (EPP)

Read more

Summary

Introduction

The genomes of nearly 400 eukaryotes and nearly 3000 prokaryotes are or are in the process of being sequenced. Most of these organisms have thousands of genes, yet only a few of those have been commonly used as markers for phylogenetic studies [1]. In cases where choice has been exercised, genes have been selected for sequencing based on rough impressions of the genes’ utilities in previous studies of taxa that are to varying degrees divergent from the taxa of interest. A few rules of thumb for selecting genes for phylogenetic studies have been advocated (e.g., percent sequence divergence; 4, 5; or proportion of parsimony-informative sites; 6), their successful use is highly context dependent. Complex distributions of rates across characters can yield information regarding some periods of the history encompassed by the phylogeny but not others [4,5,6]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.