Abstract

A critical step in studying biological features (e.g., genetic variants, gene families, metabolic capabilities, or taxa) is assessing their diversity and distribution among a sample of individuals. Accurate assessments of these patterns are essential for linking features to traits or outcomes of interest and understanding their functional impact. Consequently, it is of crucial importance that the measures employed for quantifying feature diversity can perform robustly under any evolutionary scenario. However, the standard measures used for quantifying and comparing the distribution of features, such as prevalence, phylogenetic diversity, and related approaches, either do not take into consideration evolutionary history, or assume strictly vertical patterns of inheritance. Consequently, these approaches cannot accurately assess diversity for features that have undergone recombination or horizontal transfer. To address this issue, we have devised RecPD, a novel recombination-aware phylogenetic-diversity statistic for measuring the distribution and diversity of features under all evolutionary scenarios. RecPD utilizes ancestral-state reconstruction to map the presence / absence of features onto ancestral nodes in a species tree, and then identifies potential recombination events in the evolutionary history of the feature. We also derive several related measures from RecPD that can be used to assess and quantify evolutionary dynamics and correlation of feature evolutionary histories. We used simulation studies to show that RecPD reliably reconstructs feature evolutionary histories under diverse recombination and loss scenarios. We then applied RecPD in two diverse real-world scenarios including a preliminary study type III effector protein families secreted by the plant pathogenic bacterium Pseudomonas syringae and growth phenotypes of the Pseudomonas genus and demonstrate that prevalence is an inadequate measure that obscures the potential impact of recombination. We believe RecPD will have broad utility for revealing and quantifying complex evolutionary processes for features at any biological level.

Highlights

  • The modern genomics era has provided unprecedented opportunities for identifying and quantifying the impact of genetic variants underlying traits of interest, while furthering our understanding of the fundamental evolutionary processes driving the emergence, distribution, and fate of these variants

  • Phylogenetic diversity is an important concept utilized in evolutionary ecology which has extensive applications in population genetics to help us understand how evolutionary processes have distributed genetic variation among individuals of a species, and how this

  • As is clear from the above discussion, diversity can be measured for any type of data that varies across an environment, population or community, including species, operational taxonomic units (OTUs), nucleotides or amino acids, gene families, metabolic capacities, phenotypic traits, or even gene expression levels varying across tissues or cellular environments

Read more

Summary

Introduction

The modern genomics era has provided unprecedented opportunities for identifying and quantifying the impact of genetic variants underlying traits of interest, while furthering our understanding of the fundamental evolutionary processes driving the emergence, distribution, and fate of these variants. A critical step in studying these genetic variants is assessing their overall abundance and the distribution of individuals carrying the variants both within and between populations and/or communities. Accurate assessment of these patterns of genetic diversity are essential for linking genotypes to phenotypes and understanding the functional impact of genetic variation. Perhaps the simplest and most common diversity index is abundance (aka frequency or prevalence), which measures the proportion of individuals in a population or community that are of a particular kind, in a particular state, or which carry a trait or feature of interest. We will use the term feature to encompass this wide range of data types and define it as any measurable difference between samples

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call