The linear logistic test model (LLTM) has been widely applied to investigate the effects of item covariates on item difficulty. The LLTM was extended with random item residuals to account for item differences not explained by the item covariates. This extended LLTM is called the LLTM-R. In this article, statistical inference methods are investigated for these two models. Type I error rates and power are compared via Monte Carlo studies. Based on the simulation results, the use of the likelihood ratio test (LRT) is recommended over the paired-sample t test based on sum scores, the Wald z test, and information criteria, and the LRT is recommended over the profile likelihood confidence interval because of the simplicity of the LRT. In addition, it is concluded that the LLTM-R is the better general model approach. Inferences based on the LLTM while the LLTM-R is the true model appear to be largely biased in the liberal way, while inferences based on the LLTM-R while the LLTM is the true model are only biased in a very minor and conservative way. Furthermore, in the absence of residual variance, Type I error rate and power were acceptable except for power when the number of items is small (10 items) and also the number of persons is small (200 persons). In the presence of residual variance, however, the number of items needs to be large (80 items) to avoid an inflated Type I error and to reach a power level of .90 for a moderate effect.
Read full abstract