The Amount of Genetic Variation at the DNA Level Maintained in a Finite Population

Fumio Tajima

doi:10.1111/j.1442-1984.1996.tb00108.x

Abstract

Abstract The amount of DNA polymorphism can be estimated either from the average number of pairwise nucleotide differences per site (π) or from the proportion of segregating (polymorphic) site (s) among a sample of DNA sequences. Under a number of assumptions, including panmictic population and neutrality, it is known that the expectations of π and s are given by E(π) =θ and E(s) = a1(n)θ, where θ= 4Nv, a1(n) = 1 + 1/2 +…+ 1/(n‐1), N is the effective population size, v is the mutation rate per site per generation, and n is the sample size. Therefore, θ can be estimated by π or s/a1(n). These assumptions, however, are not always correct. In this paper, using a simple non‐random sampling model, I have examined the effect of non‐random sampling on the estimates of the amount of DNA polymorphism. The results indicate that the effect of non‐random sampling on the proportion of segregating site is substantially large whereas the effect of non‐random sampling on the average number of nucleotide differences per site is negligibly small unless non‐randomness is extremely large. Using a finite site model with and without rate variation, I have also examined the effect of rate variation among sites on the estimates of the amount of DNA polymorphism. The results indicate that if the neutral mutation rate varies among sites substantially, the estimates of θ based on the infinite site model are substantially underestimated. New methods for estimating θ are also presented.

Full Text