Abstract

Mixture models of item response theory (IRT) can be used to detect inappropriate category use. Data collected by panel surveys where attitudes and traits are typically assessed by short scales with many response categories are prone to response styles indicating inappropriate category use. However, the application of mixed IRT models to this data type can be challenging because of many threshold parameters within items. Up to now, there is very limited knowledge about the sample size required for an appropriate performance of estimation methods as well as goodness-of-fit criteria of mixed IRT models in this case. The present Monte Carlo simulation study examined these issues for two mixed IRT models [the restricted mixed generalized partial credit model (rmGPCM) and the mixed partial credit model (mPCM)]. The population parameters of the simulation study were taken from a real application to survey data which is challenging (a 5-item scale with an 11-point rating scale, and three latent classes). Additional data conditions (e.g., long tests, a reduced number of response categories, and a simple latent mixture) were included in this simulation study to improve the generalizability of the results. Under this challenging data condition, for each model, data were generated based on varying sample sizes (from 500 to 5,000 observations with a 500-step). For the additional conditions, only three sample sizes (consisting of 1,000, 2,500, and 4,500 observations) were examined. The effect of sample size on estimation problems and accuracy of parameter and standard error estimates were evaluated. Results show that the two mixed IRT models require at least 2,500 observations to provide accurate parameter and standard error estimates under the challenging data condition. The rmGPCM produces more estimation problems than the more parsimonious mPCM, mostly because of the sparse tables arising due to many response categories. These models exhibit similar trends of estimation accuracy across sample sizes. Under the additional conditions, no estimation problems are observed. Both models perform well with a smaller sample size when long tests were used or a true latent mixture includes two classes. For model selection, the AIC3 and the SABIC are the most reliable information criteria.

Highlights

  • Mixture models of item response theory (IRT) are a combination of IRT models and latent class analysis

  • As a parsimonious variant of the mixed generalized partial credit model (GPCM) (GPCM; Muraki, 1997; mGPCM; von Davier and Yamamoto, 2004), the restricted mixed generalized partial credit model (rmGPCM) defines for each latent class the conditional probability of endorsing a response category x of an item i as a function of the latent trait variable by two types of item parameters: (i) class-specific threshold parameters that define the location of transition between two adjacent categories of an item i (x – 1 and x) on the latent continuum and (ii) a class-fixed discrimination parameter of an item i that indicates how well the item differentiates between individuals with different values on the trait that is measured

  • Because only very few simulation studies have been conducted to examine mixed polytomous IRT models in general, and no simulation studies were found that considered the performance of these models under the challenging data condition that is typically observed in survey studies, this application-oriented simulation study focused on the sample size requirements for two models, the rmGPCM and the mixed partial credit model (mPCM), that are useful for exploring category use when applied to such data

Read more

Summary

Introduction

Mixture models of item response theory (IRT) are a combination of IRT models and latent class analysis (see for an overview von Davier and Carstensen, 2007). Mixture polytomous IRT models are useful for detecting latent classes that qualitatively differ in a measured personality trait or attitude (e.g., Egberink et al, 2010; Finch and Pierson, 2011; Baghaei and Carstensen, 2013; Gnaldi et al, 2016; Jensuttiwetchakul et al, 2016) or those that are characterized by response styles (e.g., Eid and Rauber, 2000; Austin et al, 2006; Wagner-Menghin, 2006; Eid and Zickar, 2007; Maij-de Meij et al, 2008; Meiser and Machunsky, 2008; Wu and Huang, 2010; Wetzel et al, 2013) They can be applied to examine construct validity (e.g., von Davier and Yamamoto, 2007; Tietjens et al, 2012), to detect differential item functioning (e.g., Frick et al, 2015; Cho et al, 2016), and to check the quality of a rating scale in general (e.g., Smith et al, 2011; Kutscher et al, 2017). It is unclear whether an application of a complex mixed IRT model would require a larger sample size or cause more estimation problems than a more parsimonious model

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call