Abstract

The Site Frequency Spectrum (SFS) and the heterozygosity of allelic variants are among the most important summary statistics for population genetic analysis of diploid organisms. We discuss the generalization of these statistics to populations of autopolyploid organisms in terms of the joint Site Frequency/Dosage Spectrum and its expected value for autopolyploid populations that follow the standard neutral model. Based on these results, we present estimators of nucleotide variability from High-Throughput Sequencing (HTS) data of autopolyploids and discuss potential issues related to sequencing errors and variant calling. We use these estimators to generalize Tajima's D and other SFS-based neutrality tests to HTS data from autopolyploid organisms. Finally, we discuss how these approaches fail when the number of individuals is small. In fact, in autopolyploids there are many possible deviations from the Hardy–Weinberg equilibrium, each reflected in a different shape of the individual dosage distribution. The SFS from small samples is often dominated by the shape of these deviations of the dosage distribution from its Hardy–Weinberg expectations.

Highlights

  • The study of nucleotide variability in polyploid species is a convoluted task that requires solving a number of methodological and analytical difficulties related to the specific nature of the species

  • In order to advance our understanding of the evolutionary processes affecting the genome of polyploid species, an important step is to gain a deeper knowledge of the way these processes modulate the fate of genetic variants, and the levels and patterns of genetic variability

  • Two of the main descriptive statistics used in population genetics to summarize genetic variability are the Site Frequency Spectrum (SFS) and the heterozygosity (h), which contain information on the global and internal allelic spectra, respectively

Read more

Summary

INTRODUCTION

The study of nucleotide variability in polyploid species is a convoluted task that requires solving a number of methodological and analytical difficulties related to the specific nature of the species (detailed in the reviews of Dufresne et al, 2014; Meirmans et al, 2018). Compared to the diploid case, the genotypes of variants in polyploid organisms present a more complex structure resulting from a combination of internal spectra for each individual We discuss this genotype structure and its decomposition into different statistics, including the SFS and a generalization of the distribution of heterozygosity that we call the Site Dosage Spectrum (SDS). We derive the expected value the most general spectrum for autopolyploids, i.e., the joint Site Frequency-Dosage Spectrum (SFDS), which represents a combination of the SFS and the SDS We use these results as a null model to build estimators of nucleotide diversity and neutrality tests for HTS data and we discuss the robustness of estimators of genetic variability. More details about their derivations can be found in the Appendix

SFS and Heterozygosity in Diploids
SFDS in Autopolyploids
The SFDS of the Standard Neutral Model
SFS ESTIMATORS AND NEUTRALITY TESTS FOR LARGE SAMPLES
Estimators of Variability
Tajima’s Estimator of Nucleotide Diversity
Neutrality Tests
SMALL SAMPLES AND HARDY–WEINBERG VIOLATIONS IN THE SDS
Inbreeding
Heterozygote Advantage
Recessive Deleterious Mutations
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call