Abstract

Fisher (1943) claimed that the expected value of the sample variance of the number of species found in large samples, each of n specimens taken from the same population, is asymptotically θlog2. This is at odds with the value θlogn obtained directly from the Ewens Sampling Formula (ESF), where θ specifies the rate at which new species are found. To resolve this apparent contradiction, we assume the species frequency spectrum in the population is determined by the ESF and that the samples are disjoint subsets drawn sequentially from this single population. We find an explicit formula for the required expected value for p samples of arbitrary size; in the limit of large equally-sized samples, it indeed has the value θlog2. We obtain limit theorems for the sample variance of p samples of size n under various limiting regimes as p,n or both tend to ∞. We discuss further the behavior of the number of species present in all samples, and revisit Fisher’s log-series distribution as the limiting distribution of the number of specimens observed in typical species in a future, large sample.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call