Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Comparing MIRT Model and MGM with Two Popular Estimation Methods for Measuring Student Ability Growth

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This study compares the performance of MIRT and MGM models using two estimation methods—Metropolis-Hastings Robbins-Monro and second-order Laplace—on longitudinal data, finding that Lap2 outperforms MH-RM and EM, with MIRT and MGM producing similar results when the number of time points is two or three.

Abstract
Translate article icon Translate Article Star icon

ABSTRACT Multidimensional item response theory (MIRT) model and multiple group item response theory model (MGM) are widely applied for measuring students’ learning and change. Recently, there are several estimation methods for multidimensional item response theory (MIRT) model and multiple group item response theory model (MGM), but previous studies have not applied these methods to longitudinal data. This study investigated the performance of MIRT and MGM with two popular estimation methods for the longitudinal data, including the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm and the second-order Laplace (Lap2) approximation. The results suggest that Lap2 outperforms both MH-RM and the expectation maximization (EM) algorithm under the MIRT model and MGM in longitudinal data. The MIRT model with Lap2 produces more accuracy of ability estimates than the MGM with both EM and Lap2, especially when the correlation between abilities is 0.8. When the number of time points is 2 and 3, the MIRT model and MGM yield highly similar results in terms of RMSE of ability estimates, ability means, and item parameters, regardless of the estimation method.

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.1080/03610918.2021.1977951
Bayesian estimation of multidimensional polytomous item response theory models with Q-matrices using Stan
  • Sep 6, 2021
  • Communications in Statistics - Simulation and Computation
  • Marcelo A Da Silva + 3 more

The Q-matrix is commonly used in diagnostic classification models and has recently been incorporated into the multidimensional item response theory (MIRT) models to add information about the relationship between items and dimensions of the latent trait. The reformulation of the MIRT models with Q-matrix (MIRT-Q) has presented to improve the precision of the parameters of these models and to provide a simple and intuitive method for users to define the item-trait relationship. This paper aims to explore the incorporation of the Q-matrix in the formulation of MIRT models for polytomous item responses. Specifically, we introduce the incorporation of the Q-matrix into two of the polytomous MIRT models most known and used: the multidimensional graded response (MGR) model, hereinafter called MGR-Q, and the multidimensional generalized partial credit (MGPC) model, hereinafter called MGPC-Q. We provide readers the code of the MGR-Q and MGPC-Q models in Stan, a Bayesian estimation software, and we conduct a simulation study in order to evaluate the parameter recovery of the estimation method. To illustrate the use of both models in practice, we fit them to an operational dataset from 2400 individuals on 13 items and demonstrate the estimation of MGR-Q and MGPC-Q using the Stan program.

  • Research Article
  • Cite Count Icon 13
  • 10.1177/0013164419891205
A General Bayesian Multidimensional Item Response Theory Model for Small and Large Samples.
  • Jan 10, 2020
  • Educational and Psychological Measurement
  • Ken A Fujimoto + 1 more

Although item response theory (IRT) models such as the bifactor, two-tier, and between-item-dimensionality IRT models have been devised to confirm complex dimensional structures in educational and psychological data, they can be challenging to use in practice. The reason is that these models are multidimensional IRT (MIRT) models and thus are highly parameterized, making them only suitable for data provided by large samples. Unfortunately, many educational and psychological studies are conducted on a small scale, leaving the researchers without the necessary MIRT models to confirm the hypothesized structures in their data. To address the lack of modeling options for these researchers, we present a general Bayesian MIRT model based on adaptive informative priors. Simulations demonstrated that our MIRT model could be used to confirm a two-tier structure (with two general and six specific dimensions), a bifactor structure (with one general and six specific dimensions), and a between-item six-dimensional structure in rating scale data representing sample sizes as small as 100. Although our goal was to provide a general MIRT model suitable for smaller samples, the simulations further revealed that our model was applicable to larger samples. We also analyzed real data from 121 individuals to illustrate that the findings of our simulations are relevant to real situations.

  • Book Chapter
  • 10.1007/978-3-030-74772-5_36
Psychometric Models for a New State Science Assessment Aligned to the Next Generation Science Standards
  • Jan 1, 2021
  • Jing Chen + 3 more

The complexity of the Next Generation Science Standards (NGSS) poses significant task design, psychometric, and practical challenges for assessments. This study focuses on the psychometric challenges and explores an appropriate measurement model to interpret scores for an NGSS-aligned state science assessment. Multiple item response theory (IRT) models based on content specifications were applied to the data collected from a pilot test of the newly developed science assessment to identify the most appropriate model. Results suggest that although the three-dimensional IRT model that aligns with the NGSS dimensions provides slightly better overall model fit than the unidimensional IRT model and the testlet model, the item-level fit of the three-dimensional model is poor. Implementing multidimensional IRT (MIRT) models requires large sample sizes and a much longer estimation time, which poses challenges in an operational setting. Future studies can be conducted to further evaluate the need for using MIRT models and the robustness of a unidimensional model under various test conditions.KeywordsMultidimensional science assessment designNGSS-aligned assessmentMultidimensional IRT models

  • Dissertation
  • 10.11606/t.104.2019.tde-06082019-161037
Modelos alternativos da TRI para dados politômicos
  • Jan 1, 2019
  • Marcelo Andrade Da Silva

The item response theory (IRT) models for polytomous data are frequently used in the analysis of data coming from the behavioral and social sciences. From a practical point of view, polytomous data are more informative than dichotomous data, since it considers more than two response categories in each test item, making the models assigned to this type of data attractive. The purpose of this research is to explore alternative polytomous IRT models and their multidimensional extensions, filling some gaps in the literature. Specifically, the chapters of this work follow a construction sequence of IRT modeling. Firstly, we conducted a study to assist readers in choosing between two of the major polytomous IRT models in the one-dimensional context: the graded response (GR) model and the generalized partial credit (GPC) model. We conducted a sensitivity analysis of priors to choose a suitable priors scenario for each model and we verified the performance of some model comparison criteria against these models through a simulation study. Then, we extend the one-dimensional GPC model to the bifactor context, proposing the GPC-bifactor model, in which a global latent trait and specific latent traits are considered through an additive structure in its formulation. In addition, we flexibilize the structure of the GPC-bifactor model, making possible its use with other link functions beyond the usual logit, such as probit and clog-log. Then, we incorporate the relation between the items and the latent trait dimensions of the individuals in the formulation of the multidimensional item response theory (MIRT) models through the Q-matrix, a component present in the vast majority of cognitive diagnostic models (CDM), making it easy for users to express the item-trait relationship in MIRT models. Finally, we propose a validation method using the Q-matrix in MIRT models. In particular, we used in the study the multidimensional GPC model with Q-matrix embedded in its formulation. The different simulation studies and the applications performed in this research showed that these models are alternative models for the analysis of polytomous data and that can be used by the users in practice.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3389/feduc.2022.801372
Investigating Subscores of VERA 3 German Test Based on Item Response Theory/Multidimensional Item Response Theory Models
  • Apr 8, 2022
  • Frontiers in Education
  • Güler Yavuz Temel + 3 more

In this study, the psychometric properties of the listening and reading subtests of the German VERA 3 test were examined using Item Response Theory (IRT) and Multidimensional Item Response Theory (MIRT) models. Listening and reading subscores were estimated using unidimensional Rasch, 1PL, and 2PL models, and total scores on the German test (listening + reading) were estimated using unidimensional and multidimensional IRT models. Various MIRT models were used, and model fit was compared in a cross-validation study. The results of the study showed that unidimensional models of the reading and listening subtests and the German test provided a good overall model-data fit, however, multidimensional models of the subtests provided a better fit. The results demonstrated that, although the subtest scores also fit adequately independently, estimating the scores of the overall test with a model (e.g., bifactor) that includes a general factor (construct) in addition to the subfactors significantly improved the psychometric properties of the test. A general factor was identified that had the highest reliability values; however, the reliabilities of the specific factors were very low. In addition to the fit of the model data, the fit of the persons with IRT/MIRT models was also examined. The results showed that the proportion of person misfit was higher for the subtests than for the overall tests, but the overfit was lower. NA-German students, who did not speak German all-day, had the highest proportion of misfits with all models.

  • Research Article
  • Cite Count Icon 1
  • 10.29329/ijpe.2020.332.12
Modelling of the Attitude-Achievement Paradox in TIMSS 2015 with respect to the Extreme Response Style Using Multidimensional Item Response Theory
  • Apr 7, 2021
  • The International Journal of Progressive Education
  • Munevver Ilgun Dibek Rahime Nukhet Cıkrıkcı

This study aims to first investigate the effect of the extreme response style (ERS) which could lead to an attitude-achievement paradox among the countries participating in the Trends in International Mathematics and Science Study (TIMSS 2015), and then to determine the individual- and country-level relationships between attitude and achievement by adjusting the effect of ERS. For the sample of this correlational study, 500 students were randomly selected from each of the 15 countries that participated in TIMSS 2015. The differences in the ERS tendency of the countries were determined by performing MANOVA. To determine the effect of ERS, two different multidimensional item response theory (MIRT) models were used: one did not include the ERS trait as a dimension while the other included this trait as a dimension. The results were analyzed with Latent GOLD 5.1 and WinBUGS software. To determine the relationship between attitudinal variables and achievement, the correlation values based on the observed scores and MIRT models were obtained. Whether there was any significant difference between these correlation values was determined by Fisher's rz transformation. The findings of this study were as follows: (a) the model in which the ERS trait was included as a dimension best fit the data and (b) the correlation values based on the observed scores were negative and those based on the MIRT models were positive, with the two statistically differing from each other. ERS is one of the factors causing the achievement-attitude paradox; however, it not sufficient to explain this paradox.

  • Research Article
  • 10.1093/ndt/gfac083.049
MO867: Confirmatory Factor Analysis and Computer Adaptive Testing System Prototype Based on Multidimensional Item Response Theory of KDQOL-36 Among a Large Sample of Spanish Dialysis Patients
  • May 3, 2022
  • Nephrology Dialysis Transplantation
  • Luca Neri + 11 more

BACKGROUND AND AIMS The Kidney Disease Quality of Life (KDQOL™-36) is widely used to assess the quality of life of dialysis patients worldwide. It combines disease-specific and generic scales capturing unique facets of patients’ adaptation to the physical and mental burden of renal disease and dialysis treatment. the KDQOL™-36 psychometric properties and cross-national validation have been subject to intense review in latest years suggesting mixed replicability in different populations. Furthermore, its length may hamper its wider use in clinical practice as respondent burden may be substantial if used longitudinally within continuous quality improvement programs. Therefore, we sought to evaluate its measurement properties and develop an item response theory model enabling personalized survey administration while maximizing measurement efficiency. METHOD This is a retrospective, observational analysis of Patient-Reported Outcomes Measures (PROM) collected in Spanish NephroCare clinics during 2019 as part of a continuous quality improvement program launched by the country medical director of Spain. We used Multidimensional Item Response Theory (MIRT) models to confirm the theoretical factor structure underlying the measurement model of KDQOL™-36. MIRT extends classical IRT in that it allows complex factor structures in the multifactorial space. A critical advantage of scoring systems based on MIRT models over classical test theory models is the ability to personalize survey administration based on patient's characteristics and previous responses, reduction of measurement error, objective calibration, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. We specified 4 competitive theoretical models for the SF-12 and the disease-specific KDQOL items (Figure 1). First, we fitted a unidimensional model, to exclude that one single factor could explain the response pattern of the questionnaire. Second, we tested a two-factor (or three-factor) orthogonal model for the SF-12 (and disease-specific KDQOL) to account for the standard scoring structure of the KDQOL™-36 questionnaire. Third, we tested the hypothesis of correlated two-factor (or three-factor) structure. Fourth, we tested a bifactor model structure, which accounts for a general HRQOL latent construct as well as lower order dimensions tapping specific dimensions of health. We expect the bifactor model would capture the pleiotropic effect of ESKD on patients’ life adjustment. We compared model fit with RMSEA, CFI and TLI statistics. For illustrative purposes, we simulated MIRT-based adaptive testing using different Delta Thetas as stopping rules to assesses measurement efficiency for the SF-12 component of the questionnaire. RESULTS Among patients completing the survey in the 2019 ePROM Spanish wave, 1838 (80.6%) dialysis prevalent patients met the inclusion criteria for the present analysis. Mean age was 68.8 ± 14.4, 60% were men, 65% had an arteriovenous fistula, and 66% were on HDF. For both generic and disease-specific KDQOL-36 components, the bifactor model significantly improved empirical fit (Figure 1). Despite slightly inferior compared with the bifactor model solutions, a simpler correlated second-order factor solution showed acceptable model fit. CAT simulation of the SF-12 showed reduction in the amount of administered item while preserving measurement reliability (Figure 2). CONCLUSION As previously reported, a correlated two factor structure and three-factor structure for the SF-12 and disease-specific KDQOL, respectively, showed acceptable fit to the observed response pattern. However, a bifactor model structure improved model fit. CAT simulation based on delta theta rules for the SF-12 questionnaire demonstrated promising results as an item selection strategy.

  • Research Article
  • Cite Count Icon 16
  • 10.1177/0013164418814898
Incorporating the Q-Matrix Into Multidimensional Item Response Theory Models.
  • Nov 30, 2018
  • Educational and Psychological Measurement
  • Marcelo A Da Silva + 3 more

Multidimensional item response theory (MIRT) models use data from individual item responses to estimate multiple latent traits of interest, making them useful in educational and psychological measurement, among other areas. When MIRT models are applied in practice, it is not uncommon to see that some items are designed to measure all latent traits while other items may only measure one or two traits. In order to facilitate a clear expression of which items measure which traits and formulate such relationships as a math function in MIRT models, we applied the concept of the Q-matrix commonly used in diagnostic classification models to MIRT models. In this study, we introduced how to incorporate a Q-matrix into an existing MIRT model, and demonstrated benefits of the proposed hybrid model through two simulation studies and an applied study. In addition, we showed the relative ease in modeling educational and psychological data through a Bayesian approach via the NUTS algorithm.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-56294-0_2
Properties of Second-Order Exponential Models as Multidimensional Response Models
  • Jan 1, 2017
  • Carolyn J Anderson + 1 more

Second-order exponential (SOE) models have been proposed as item response models (e.g., Anderson et al., J. Educ. Behav. Stat. 35:422–452, 2010; Anderson, J. Classif. 30:276–303, 2013. doi: 10.1007/s00357-00357-013-9131-x; Hessen, Psychometrika 77:693–709, 2012. doi:10.1007/s11336-012-9277-1 Holland, Psychometrika 55:5–18, 1990); however, the philosophical and theoretical underpinnings of the SOE models differ from those of standard item response theory models. Although presented as reexpressions of item response theory models (Holland, Psychometrika 55:5–18, 1990), which are reflective models, the SOE models are formative measurement models. We extend Anderson and Yu (Psychometrika 72:5–23, 2007) who studied unidimensional models for dichotomous items to multidimensional models for dichotomous and polytomous items. The properties of the models for multiple latent variables are studied theoretically and empirically. Even though there are mathematical differences between the second-order exponential models and multidimensional item response theory (MIRT) models, the SOE models behave very much like standard MIRT models and in some cases better than MIRT models.

  • Research Article
  • 10.1177/00131644231210509
Effects of the Quantity and Magnitude of Cross-Loading and Model Specification on MIRT Item Parameter Recovery.
  • Dec 21, 2023
  • Educational and psychological measurement
  • Mostafa Hosseinzadeh + 1 more

In real-world situations, multidimensional data may appear on large-scale tests or psychological surveys. The purpose of this study was to investigate the effects of the quantity and magnitude of cross-loadings and model specification on item parameter recovery in multidimensional Item Response Theory (MIRT) models, especially when the model was misspecified as a simple structure, ignoring the quantity and magnitude of cross-loading. A simulation study that replicated this scenario was designed to manipulate the variables that could potentially influence the precision of item parameter estimation in the MIRT models. Item parameters were estimated using marginal maximum likelihood, utilizing the expectation-maximization algorithms. A compensatory two-parameter logistic-MIRT model with two dimensions and dichotomous item-responses was used to simulate and calibrate the data for each combination of conditions across 500 replications. The results of this study indicated that ignoring the quantity and magnitude of cross-loading and model specification resulted in inaccurate and biased item discrimination parameter estimates. As the quantity and magnitude of cross-loading increased, the root mean square of error and bias estimates of item discrimination worsened.

  • Research Article
  • Cite Count Icon 37
  • 10.1177/0146621614545983
Comparing Two Algorithms for Calibrating the Restricted Non-Compensatory Multidimensional IRT Model.
  • Aug 19, 2014
  • Applied Psychological Measurement
  • Chun Wang + 1 more

The non-compensatory class of multidimensional item response theory (MIRT) models frequently represents the cognitive processes underlying a series of test items better than the compensatory class of MIRT models. Nevertheless, few researchers have used non-compensatory MIRT in modeling psychological data. One reason for this lack of use is because non-compensatory MIRT item parameters are notoriously difficult to accurately estimate. In this article, we propose methods to improve the estimability of a specific non-compensatory model. To initiate the discussion, we address the non-identifiability of the explored non-compensatory MIRT model by suggesting that practitioners use an item-dimension constraint matrix (namely, a Q-matrix) that results in model identifiability. We then compare two promising algorithms for high-dimensional model calibration, Markov chain Monte Carlo (MCMC) and Metropolis-Hastings Robbins-Monro (MH-RM), and discuss, via analytical demonstrations, the challenges in estimating model parameters. Based on simulation studies, we show that when the dimensions are not highly correlated, and when the Q-matrix displays appropriate structure, the non-compensatory MIRT model can be accurately calibrated (using the aforementioned methods) with as few as 1,000 people. Based on the simulations, we conclude that the MCMC algorithm is better able to estimate model parameters across a variety of conditions, whereas the MH-RM algorithm should be used with caution when a test displays complex structure and when the latent dimensions are highly correlated.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/access.2024.3492188
Multidimensional Hybrid Computerized Adaptive Testing Based on Multidimensional Item Response Theory
  • Jan 1, 2024
  • IEEE Access
  • Mingyu Shao + 4 more

Computerized adaptive testing (CAT) and multistage adaptive testing (MST) are widely used to deliver assessment questions in the fields of psychometrics, educational measurement, and medical assessments. Hybrid computerized adaptive testing (HCAT), as a novel and flexible approach that incorporates both modular and adaptively-selected items, effectively integrates the CAT and MST, and inherits their respective strengths. Current HCAT focuses on unidimensional assessments, yet practical applications often require multidimensional assessments. Multidimensional item response theory (MIRT) models can provide accurate measurement of examinees’ multidimensional latent traits. Based on the MIRT models, this study proposes an innovative approach for constructing multidimensional hybrid computerized adaptive testing (MHCAT), aimed at better accommodating complex testing demands. Simulation studies were conducted to evaluate MHCAT using both dichotomous and polytomous items. Results indicated that, the fixed-length MHCAT achieved similar estimation accuracy to the fixed-length multidimensional CAT (MCAT), and the variable-length MHCAT had slightly higher estimation accuracy than the variable-length MCAT. Regarding item exposure control, both the fixed-length and variable-length MHCAT performed better than the MCAT. Empirical studies further validated the feasibility of MHCAT with several MIRT models. In summary, the proposed MHCAT presents promising performance in assessing examinees’ abilities while maintaining satisfactory item exposure control, providing a valuable approach for multidimensional assessments.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3389/fpsyg.2021.644764
Model Selection for Cogitative Diagnostic Analysis of the Reading Comprehension Test.
  • Aug 13, 2021
  • Frontiers in Psychology
  • Hui Liu + 1 more

Reading subskills are generally regarded as continuous variables, while most models used in the previous reading diagnoses have the hypothesis that the latent variables are dichotomous. Considering that the multidimensional item response theory (MIRT) model has continuous latent variables and can be used for diagnostic purposes, this study compared the performances of MIRT with two representatives of traditionally widely used models in reading diagnoses [reduced reparametrized unified model (R-RUM) and generalized deterministic, noisy, and gate (G-DINA)]. The comparison was carried out with both empirical and simulated data. First, model-data fit indices were used to evaluate whether MIRT was more appropriate than R-RUM and G-DINA with real data. Then, with the simulated data, relations between the estimated scores from MIRT, R-RUM, and G-DINA and the true scores were compared to examine whether the true abilities were well-represented, correct classification rates under different research conditions for MIRT, R-RUM, and G-DINA were calculated to examine the person parameter recovery, and the frequency distributions of subskill mastery probability were also compared to show the deviation of the estimated subskill mastery probabilities from the true values in the general value distribution. The MIRT obtained better model-data fit, gained estimated scores being a more reasonable representation for the true abilities, had an advantage on correct classification rates, and showed less deviation from the true values in frequency distributions of subskill mastery probabilities, which means it can produce more accurate diagnostic information about the reading abilities of the test-takers. Considering that more accurate diagnostic information has greater guiding value for the remedial teaching and learning, and in reading diagnoses, the score interpretation will be more reasonable with the MIRT model, this study recommended MIRT as a new methodology for future reading diagnostic analyses.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 32
  • 10.3389/fpsyg.2019.00145
Modeling Test-Taking Non-effort in MIRT Models.
  • Feb 4, 2019
  • Frontiers in Psychology
  • Yue Liu + 3 more

The validity of inferences based on test scores will be threatened when examinees' test-taking non-effort is ignored. A possible solution is to add test-taking effort indicators in the measurement model after the non-effortful responses are flagged. As a new application of the multidimensional item response theory (MIRT) model for non-ignorable missing responses, this article proposed a MIRT method to account for non-effortful responses. Two simulation studies were conducted to examine the impact of non-effortful responses on item and latent ability parameter estimates, and to evaluate the performance of the MIRT method, comparing to the three-parameter logistic (3PL) model as well as the effort-moderated model. Results showed that: (a) as the percentage of non-effortful responses increased, the unidimensional 3PL model yielded poorer parameter estimates; (b) the MIRT model could obtain as accurate item parameter estimates as the effort-moderated model; (c) the MIRT model provided the most accurate ability parameter estimates when the correlation between test-taking effort and ability was high. A real data analysis was also conducted for illustration. The limitation and future research were discussed further.

  • Book Chapter
  • 10.36253/978-88-5518-461-8.09
Clustering students according to their proficiency: a comparison between different approaches based on item response theory models
  • Jan 1, 2021
  • Rosa Fabbricatore + 1 more

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant