ABSTRACT Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to monitor rater drift. In this paper, three statistics, termed E-statistics, that account for the product-multinomial sampling by comparing conditional distributions, are introduced. A simulation compares performance with the paired t-test and Stuart’s Q in detecting rater drift. Both the paired t-test and Q suffered extreme Type I error inflation for certain rescore study designs. The new E-statistics maintained good Type I error control and had good power to detect rater drift across occasions.
Read full abstract