Abstract

The objective of this article is twofold. First several recently established mathematical results regarding diversity indices are introduced. These results include a re-parameterization of the word type probability distribution, a generalization of Simpson's diversity index, a family of optimal estimators of the generalized Simpson's indices, and two large sample distributional properties of the estimators. Second several characteristics of word type diversity are measured for a collection of 14 Shakespearean sonnets and for the Shakespearean canon. A difference in word type diversity, and in turn, in word type relative frequency distribution, between Shakespearean sonnets and the Shakespearean canon is observed with significant statistical evidence. This finding suggests that in general, as measured by word type diversity, the same author could have distinct word usage patterns in different forms of writing, and that in particular, caution must be taken when the Shakespearean canon is used as a basis for authorship attribution to Shakespeare where sonnets are concerned.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call