Abstract

Classic concepts of genetic (gene) diversity (heterozygosity) such as Nei & Li’s nucleotide diversity were defined within a population context. Although variations are often measured in population context, the basic carriers of variation are individuals. Hence, measuring variations such as SNP of an individual against a reference genome, which has been ignored previously, is certainly in its own right. Indeed, similar practice has been a tradition in community ecology, where the basic unit of diversity measure is individual community sample. We propose to use Renyi’s-entropy-based Hill numbers to define individual-level genetic diversity and similarity and demonstrate the definitions with the SNP (single nucleotide polymorphism) datasets from the 1000-Genomes Project. Hill numbers, derived from Renyi’s entropy (of which Shannon’s entropy is a special case), have found widely applications including measuring the quantum information entanglement and ecological diversity. The demonstrated individual-level SNP diversity not only complements the existing population-level genetic diversity concepts, but also offers building blocks for comparative genetic analysis at higher levels. The concept of individual covers, but is not limited to, individual chromosome, region of chromosome, gene cluster(s), or whole genome. Similarly, the SNP can be replaced by other structural variants or mutation types such as indels.

Highlights

  • We demonstrate the implementations of our definitions for the SNP diversity and similarity measures with the SNP datasets obtained from 1000-Genomes Project, consisting of 2504 individuals belonging to 5 populations (The 1000 Genomes Project Consortium 2015; Sudmant et al.)[3,22]

  • The SNP diversity defined in this article can be applied separately to the three types of SNP occurrence regions

  • We did not distinguish the three types in this article, but all the definitions and computational procedures presented in previous sections can be directly applied to separate measuring of the SNP diversities

Read more

Summary

Diversity and Similarity Profiles

Classic concepts of genetic (gene) diversity (heterozygosity) such as Nei & Li’s nucleotide diversity were defined within a population context. We propose to use Renyi’s-entropy-based Hill numbers to define individual-level genetic diversity and similarity and demonstrate the definitions with the SNP (single nucleotide polymorphism) datasets from the 1000-Genomes Project. As reiterated in Sherwin et al.[21], information theory has been playing a broadening role in molecular ecology and evolution Similar to their critical roles in measuring ecological diversity, Hill numbers can capture essential properties of the SNP distribution on a genetic entity such as a chromosome or a genome and offer effective metrics for measuring SNP diversity. The SNP alpha-diversity we will define, in effect, measures the unevenness or heterogeneity of SNPs in a genetic entity such as a chromosome or a genome at the individual level This complements the current population-level genetic (gene) diversity, and provides building blocks for further comparative SNP analyses.

Concepts and Definitions
The Definitions for SNP Similarities
AMR EUR EAS
Four Similarity Measures
Author contributions
Findings
Additional information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call