Abstract

Good-Turing frequency estimation (Good, ) is a simple, effective method for predicting detection probabilities of objects of both observed and unobserved classes based on observed frequencies of classes in a sample. The method has been used widely in several disciplines, such as information retrieval, computational linguistics, text recognition, and ecological diversity estimation. Nevertheless, existing studies assume sampling with replacement or sampling from an infinite population, which might be inappropriate for many practical applications. In light of this limitation, this article presents a modification of the Good-Turing estimation method to account for finite population sampling. We provide three practical extensions of the modified method, and we examine performance of the modified method and its extensions in simulation experiments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call