We examine the discrete distributional form that arises from the “classical occupancy problem,” which looks at the behavior of the number of occupied bins when we allocate a given number of balls uniformly at random to a given number of bins. We review the mass function and moments of the classical occupancy distribution and derive exact and asymptotic results for the mean, variance, skewness and kurtosis. We develop an algorithm to compute a cubic array of log-probabilities from the classical occupancy distribution. This algorithm allows the computation of large blocks of values while avoiding underflow problems in computation. Using this algorithm, we compute the classical occupancy distribution for a large block of values of balls and bins, and we measure the accuracy of its asymptotic approximation using the normal distribution. We analyze the accuracy of the normal approximation with respect to the variance, skewness and kurtosis of the distribution. Based on this analysis, we give some practical guidance on the feasibility of computing large blocks of values from the occupancy distribution, and when approximation is required.
Read full abstract