Item Response Analysis of a Structured Mixture Item Response Model with mirt Package in R
Structured mixture item response models (StrMixIRMs) are a special type of constrained confirmatory mixture item response theory (IRT) model for detecting latent performance differences in a measurement instrument by characteristic item groups, and classifying respondents according to these differences. In light of limited software options for estimating StrMixIRMs under existing frameworks, this paper proposes reparameterizing it as a confirmatory mixture IRT model using interaction effects between latent classes and item groups. The reparameterization allows for easier implementation of StrMixIRMs with multiple software programs that have mixture modeling capabilities, including open-source ones. This widens the accessibility to these models to a broad range of users and thus can facilitate research and applications of StrMixIRMs. This paper serves two main goals: First, we introduce StrMixIRMs, focusing on the proposed reparameterization based on interaction effects and its various extensions. Second, we illustrate use cases of this novel reparameterization within the mirt 1.41 package in R by employing two empirical datasets. Detailed R code with notes are provided for the applications along with an interpretation of the outputs.
- Research Article
22
- 10.1177/0146621615605080
- Sep 22, 2015
- Applied Psychological Measurement
Unidimensional, item response theory (IRT) models assume a single homogeneous population. Mixture IRT (MixIRT) models can be useful when subpopulations are suspected. The usual MixIRT model is typically estimated assuming a normally distributed latent ability. Research on normal finite mixture models suggests that latent classes potentially can be extracted, even in the absence of population heterogeneity, if the distribution of the data is non-normal. In this study, the authors examined the sensitivity of MixIRT models to latent non-normality. Single-class IRT data sets were generated using different ability distributions and then analyzed with MixIRT models to determine the impact of these distributions on the extraction of latent classes. Results suggest that estimation of mixed Rasch models resulted in spurious latent class problems in the data when distributions were bimodal and uniform. Mixture two-parameter logistic (2PL) and mixture three-parameter logistic (3PL) IRT models were found to be more robust to latent non-normality.
- Research Article
14
- 10.1177/0013164416640327
- Jul 11, 2016
- Educational and Psychological Measurement
Mixture item response theory (IRT) models have been suggested as an efficient method of detecting the different response patterns derived from latent classes when developing a test. In testing situations, multiple latent traits measured by a battery of tests can exhibit a higher-order structure, and mixtures of latent classes may occur on different orders and influence the item responses of examinees from different classes. This study aims to develop a new class of higher-order mixture IRT models by integrating mixture IRT models and higher-order IRT models to address these practical concerns. The proposed higher-order mixture IRT models can accommodate both linear and nonlinear models for latent traits and incorporate diverse item response functions. The Rasch model was selected as the item response function, metric invariance was assumed in the first simulation study, and multiparameter IRT models without an assumption of metric invariance were used in the second simulation study. The results show that the parameters can be recovered fairly well using WinBUGS with Bayesian estimation. A larger sample size resulted in a better estimate of the model parameters, and a longer test length yielded better individual ability recovery and latent class membership recovery. The linear approach outperformed the nonlinear approach in the estimation of first-order latent traits, whereas the opposite was true for the estimation of the second-order latent trait. Additionally, imposing identical factor loadings between the second- and first-order latent traits by fitting the mixture bifactor model resulted in biased estimates of the first-order latent traits and item parameters. Finally, two empirical analyses are provided as an example to illustrate the applications and implications of the new models.
- Research Article
68
- 10.3102/1076998609353111
- Jun 1, 2010
- Journal of Educational and Behavioral Statistics
Mixture item response theory models have been suggested as a potentially useful methodology for identifying latent groups formed along secondary, possibly nuisance dimensions. In this article, we describe a multilevel mixture item response theory (IRT) model (MMixIRTM) that allows for the possibility that this nuisance dimensionality may function differently at different levels. A MMixIRT model is described that enables simultaneous detection of differences in latent class composition at both examinee and school levels. The MMixIRTM can be viewed as a combination of an IRT model, an unrestricted latent class model, and a multilevel model. A Bayesian estimation of the MMixIRTM is described including analysis of label switching, use of priors, and model selection strategies. Results of a simulation study indicated that the generated parameters were recovered very well for the conditions considered. Use of MMixIRTM also was illustrated with the standardized mathematics test.
- Research Article
- 10.21449/ijate.1164590
- Dec 22, 2022
- International Journal of Assessment Tools in Education
This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent classes (2, 3); missing data type (complete, missing at random (MAR) and missing not at random (MNAR)), and the percentage of missing data (10%, 20%). Data were generated for each of the three mixture IRT models using the code written in R program. MplusAutomation package, which provides the automation of R and Mplus program, was used to analyze the data. The mean RMSE values for item difficulty, item discrimination, and guessing parameter estimation were determined. The mean RMSE values as to the Mixture Rasch model were found to be lower than those of the Mixture 2PL and Mixture 3PL models. Percentages of classification accuracy were also computed. It was noted that the Mixture Rasch model with 30 items, 2 classes, 1000 sample size, and complete data conditions had the highest classification accuracy percentage. Additionally, a factorial ANOVA was used to evaluate each factor's main effects and interaction effects.
- Research Article
13
- 10.3389/fpsyg.2016.00255
- Feb 24, 2016
- Frontiers in Psychology
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
- Research Article
81
- 10.1177/0146621607312613
- Apr 16, 2008
- Applied Psychological Measurement
Mixture item response theory (IRT) models aid the interpretation of response behavior on personality tests and may provide possibilities for improving prediction. Heterogeneity in the population is modeled by identifying homogeneous subgroups that conform to different measurement models. In this study, mixture IRT models were applied to the Extroversion and Neuroticism scales of the Amsterdam Biographical Questionnaire, and a three-class mixture version of the nominal response model was identified as the best fitting model. The latent classes differed with respect to social desirability and ethnic background. Within latent classes, response tendencies demonstrated a differential use of the ``?'' category. An important issue is whether applying mixture IRT models results in a better prediction of relevant external criteria compared to a one-class model. For the Neuroticism scale the prediction improved, but not for the Extraversion scale. The results demonstrate the possible advantage of applying mixture IRT models to personality questionnaires.
- Research Article
- 10.1186/s40536-024-00226-7
- Nov 2, 2024
- Large-scale Assessments in Education
The focus of this study is to use the mixture item response theory (MixIRT) model while implementing the no-U-turn sampler as a technique for investigating the presence of latent classes (i.e., subpopulations) among eighth-grade students who were administered TIMSS 2019 mathematics subtest in paper format from the gulf cooperation council (GCC) countries. One-, two-, and constrained three-parameter logistic MixIRT models with one to four classes were used to fit to the data, where the model data fit was assessed using Bayesian fit indices. The results indicate that multiple latent classes or subpopulations can better reflect the mathematical proficiency of eighth graders from the four GCC countries, and specifically the two-class constrained three-parameter MixIRT model provides a relatively better fit to the data. The results also indicate that when a mixture of several latent classes present, the conventional unidimensional IRT model is limited in providing information for multiple latent classes and shall be avoided. In addition to adding to the existing literature on MixIRT models for international large-scale assessments such as TIMSS on its heterogenous subpopulations from a fully Bayesian approach, this study sheds light on the limitation of conventional unidimensional IRT models and subsequently directs attention to the use of the more complex MixIRT model for such assessments.
- Research Article
- 10.21031/epod.1457880
- Jun 30, 2024
- Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi
Studies on the differential item functioning (DIF) are usually considered in the context of manifest groups. Recently, with the increase in the number of analyses conducted with mixture models, investigating the situations that cause differences between groups has come to the forefront. In addition, it is considered important to examine the DIF with mixture models in which levels are also handled. In this study, it is aimed to compare the results of the multilevel mixture item response theory (MMIRT) model and the mixture item response theory (MIRT) model and the results of the DIF analyses based on the manifest groups. The research sample consists of students who answered the second booklet in the electronic Trends in International Mathematics and Science Study (eTIMSS) 2019 and coded their gender. The answers given to 15 items were analyzed with the Mantel Haenszel (MH) method for the gender variable according to the manifest groups, and with the selection of the most appropriate models by varying the number of groups and the number of levels according to the MIRT model and the MMIRT model. DIF analyses of the obtained latent groups were also performed with the MH method. In the light of the findings, the number of items displaying DIF in both the MIRT model and the MMIRT model is higher than the manifest groups. While only one item displayed DIF in the analysis according to gender, 14 items displayed DIF according to the MIRT model and seven items displayed DIF according to the MMIRT model. There is not a complete overlap in the number of DIF items and DIF effect sizes found as a result of the MIRT model and MMIRT model analyses. For this reason, a level analysis should be conducted before the analyses and if there is multi-levelness, the analyses should be conducted by taking this situation into consideration.
- Book Chapter
5
- 10.1007/978-3-319-07503-7_3
- Jan 1, 2015
Unidimensional item response theory (IRT) models assume that a single model applies to all people in the population. Mixture IRT models can be useful when subpopulations are suspected. The usual mixture IRT model is typically estimated assuming normally distributed latent ability. Research on normal finite mixture models suggests that latent classes potentially can be extracted even in the absence of population heterogeneity if the distribution of the data is nonnormal. Empirical evidence suggests, in fact, that test data may not always be normal. In this study, we examined the sensitivity of mixture IRT models to latent nonnormality. Single-class IRT data sets were generated using different ability distributions and then analyzed with mixture IRT models to determine the impact of these distributions on the extraction of latent classes. Preliminary results suggest that estimation of mixed Rasch models resulted in spurious latent class problems in the data when distributions were bimodal and uniform. Mixture 2PL and mixture 3PL IRT models were found to be more robust to nonnormal latent ability distributions. Two popular information criterion indices, Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC), were used to inform model selection. For most conditions, the performance of BIC index was better than the AIC for selection of the correct model.
- Research Article
3
- 10.3389/fpsyg.2020.00197
- Feb 14, 2020
- Frontiers in Psychology
The standard item response theory (IRT) model assumption of a single homogenous population may be violated in real data. Mixture extensions of IRT models have been proposed to account for latent heterogeneous populations, but these models are not designed to handle multilevel data structures. Ignoring the multilevel structure is problematic as it results in lower-level units aggregated with higher-level units and yields less accurate results, because of dependencies in the data. Multilevel data structures cause such dependencies between levels but can be modeled in a straightforward way in multilevel mixture IRT models. An important step in the use of multilevel mixture IRT models is the fit of the model to the data. This fit is often determined based on relative fit indices. Previous research on mixture IRT models has shown that performances of these indices and classification accuracy of these models can be affected by several factors including percentage of class-variant items, number of items, magnitude and size of clusters, and mixing proportions of latent classes. As yet, no studies appear to have been reported examining these issues for multilevel extensions of mixture IRT models. The current study aims to investigate the effects of several features of the data on the accuracy of model selection and parameter recovery. Results are reported on a simulation study designed to examine the following features of the data: percentages of class-variant items (30, 60, and 90%), numbers of latent classes in the data (with from 1 to 3 latent classes at level 1 and 1 and 2 latent classes at level 2), numbers of items (10, 30, and 50), numbers of clusters (50 and 100), cluster size (10 and 50), and mixing proportions [equal (0.5 and 0.5) vs. non-equal (0.25 and 0.75)]. Simulation results indicated that multilevel mixture IRT models resulted in less accurate estimates when the number of clusters and the cluster size were small. In addition, mean Root mean square error (RMSE) values increased as the percentage of class-variant items increased and parameters were recovered more accurately under the 30% class-variant item conditions. Mixing proportion type (i.e., equal vs. unequal latent class sizes) and numbers of items (10, 30, and 50), however, did not show any clear pattern. Sample size dependent fit indices BIC, CAIC, and SABIC performed poorly for the smaller level-1 sample size. For the remaining conditions, the SABIC index performed better than other fit indices.
- Research Article
- 10.1007/s11136-022-03169-0
- Jun 18, 2022
- Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation
Mixture item response theory (MixIRT) models can be used to uncover heterogeneity in responses to items that comprise patient-reported outcome measures (PROMs). This is accomplished by identifying relatively homogenous latent subgroups in heterogeneous populations. Misspecification of the number of latent subgroups may affect model accuracy. This study evaluated the impact of specifying too many latent subgroups on the accuracy of MixIRT models. Monte Carlo methods were used to assess MixIRT accuracy. Simulation conditions included number of items and latent classes, class size ratio, sample size, number of non-invariant items, and magnitude of between-class difference in item parameters. Bias and mean square error in item parameters and accuracy of latent class recovery were assessed. When the number of latent classes was correctly specified, the average bias and MSE in model parameters decreased as the number of items and latent classes increased, but specification of too many latent classes resulted in modest decrease (i.e., < 10%) in the accuracy of latent class recovery. The accuracy of MixIRT model is largely influenced by the overspecification of the number of latent classes. Appropriate choice of goodness-of-fit measures, study design considerations, and a priori contextual understanding of the degree of sample heterogeneity can guide model selection.
- Research Article
- 10.1080/15305058.2025.2596321
- Dec 26, 2025
- International Journal of Testing
Mixture item response theory (MixIRT) models, integrating latent class models with traditional IRT models, aim to uncover latent subgroups in the data, allowing an IRT model to hold in each identified latent class. These models’ potential for unraveling individuals’ cognitive processing has garnered scholars’ attention, providing deeper insights into their capabilities. The current study replicates Sen and Cohen (2019) review article to offer an updated exploration of MixIRT model applications by examining additional articles from the past five years and considering some extra features. Various applications of MixIRT models were highlighted, such as test speededness, heterogeneity, and differential item functioning. In addition, applications of these models were extended to examine psychometric quality, compare different MixIRT models for model-data fit, and develop more complex MixIRT models. Trends in the implementation of MixIRT models and study characteristics, along with relevant points and information, were reported.
- Preprint Article
2
- 10.31219/osf.io/tgys3_v1
- Jan 30, 2025
Careless and insufficient effort responding (C/IER) in self-report questionnaires occurs when responses are given without attention being paid to the item content. In this study, we provide a mixture item response theory (IRT) model that efficiently accommodates various C/IER types and can be easily implemented in standard IRT software. In an extensive simulation study, we evaluated the conditions that might facilitate the separation of respondents who exhibit and respondents who do not exhibit C/IER. The results indicate that, for all investigated patterns of C/IER, the suggested model performs well when scales comprise 10 or more items, include items that clearly differ from each other in their category thresholds (high item heterogeneity), and combine positively and negatively worded items. Empirical support for interpreting the latent class variable as assessing C/IER was obtained by reanalyzing a publicly available Big Five inventory data set. The model-based identification of C/IER was closely aligned with results from attention check items; the mixture IRT model was more sensitive to differences in the prevalence of C/IER across assessment platforms than the attention check items were.
- Research Article
7
- 10.1017/s1930297500003211
- Jan 1, 2015
- Judgment and Decision Making
Whether it pertains to the foods to buy when one is on a diet, the items to take along to the beach on one’s day off or (perish the thought) the belongings to save from one’s burning house, choice is ubiquitous. We aim to determine from choices the criteria individuals use when they select objects from among a set of candidates. In order to do so we employ a mixture IRT (item-response theory) model that capitalizes on the insights that objects are chosen more often the better they meet the choice criteria and that the use of different criteria is reflected in inter-individual selection differences. The model is found to account for the inter-individual selection differences for 10 ad hoc and goal-derived categories. Its parameters can be related to selection criteria that are frequently thought of in the context of these categories. These results suggest that mixture IRT models allow one to infer from mere choice behavior the criteria individuals used to select/discard objects. Potential applications of mixture IRT models in other judgment and decision making contexts are discussed.
- Research Article
10
- 10.3758/s13428-013-0413-3
- Nov 21, 2013
- Behavior Research Methods
This article describes a generalized longitudinal mixture item response theory (IRT) model that allows for detecting latent group differences in item response data obtained from electronic learning (e-learning) environments or other learning environments that result in large numbers of items. The described model can be viewed as a combination of a longitudinal Rasch model, a mixture Rasch model, and a random-item IRT model, and it includes some features of the explanatory IRT modeling framework. The model assumes the possible presence of latent classes in item response patterns, due to initial person-level differences before learning takes place, to latent class-specific learning trajectories, or to a combination of both. Moreover, it allows for differential item functioning over the classes. A Bayesian model estimation procedure is described, and the results of a simulation study are presented that indicate that the parameters are recovered well, particularly for conditions with large item sample sizes. The model is also illustrated with an empirical sample data set from a Web-based e-learning environment.
- Ask R Discovery
- Chat PDF