Comparison of item characteristic analysis models of reading literacy test with polytomous Item Response Theory

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This study aims to compare the analysis model of the characteristics of reading literacy items with the polytomous item response theory, which uses the Graded Response Model (GRM), Partial Credit Model (PCM), Generalized Partial Credit Model (GPCM), and Nominal Reasons Model (NRM). This research is quantitative research in nature, and secondary data were used from about 1000 test takers’ responses to reading literacy items in the 2018 reading literacy study analyzed with the R program. This model comparison was carried out so that the analysis results obtained were more accurate in representing the level of reading literacy skills in Indonesia. The results show that the GPCM model is the fit model with an AIC value of 23753.89 and a BIC value of 24042.45, and the number of suitable testlets is 7 out of a total of 7 testlets. Based on the relationship between information function scores and SEM, reading literacy items provide higher information when participants’ abilities range between -2.3 and +2.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.21831/reid.v8i2.54429
The effect of scoring correction and model fit on the estimation of ability parameter and person fit on polytomous item response theory
  • Dec 31, 2022
  • REID (Research and Evaluation in Education)
  • Agus Santoso + 6 more

Scoring quality has been recognized as one of the important aspects that should be of concern to both test developers and users. This study aimed to investigate the effect of scoring correction and model fit on the estimation of ability parameters and person fit in the polytomous item response theory. The result of 165 students in the Statistics course (SATS4410) test at one of the universities in Indonesia was used to answer the problems in this study. The polytomous data obtained from scoring the test results were analyzed using the Item Response Theory (IRT) approach with the Partial Credit Model (PCM), Graded Response Model (GRM), and Generalized Partial Credit Model (GPCM). The effect of scoring correction and model fit on the estimation of ability and person fit was tested using multivariate analysis. Among the three models used, GRM showed the best fit based on p-value and RSMEA. The results of the analysis also showed that there was no significant effect of scoring correction and model fit on the estimation of the test taker’s ability and person fit. From the results of this study, we recommend the importance of evaluating the levels or categories used in scoring student work on a test.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/sym13020223
Item Response Theory Models for the Fuzzy TOPSIS in the Analysis of Survey Data
  • Jan 29, 2021
  • Symmetry
  • Bartłomiej Jefmański + 1 more

The fuzzy TOPSIS (The Technique for Order of Preference by Similarity to Ideal Solution) is an attractive tool for measuring complex phenomena based on uncertain data. The original version of the method assumes that the object assessments in terms of the adopted criteria are expressed as triangular fuzzy numbers. One of the crucial stages of the fuzzy TOPSIS is selecting the fuzzy conversion scale, which is used to evaluate objects in terms of the adopted criteria. The choice of a fuzzy conversion scale may influence the results of the fuzzy TOPSIS. There is no uniform approach in constructing and selecting the fuzzy conversion scale for the fuzzy TOPSIS. The choice is subjective and made by researchers. Therefore, the aim of the article is to present a new, objective approach to the construction of fuzzy conversion scales based on Item Response Theory (IRT) models. The following models were used in the construction of fuzzy conversion scales: Polychoric Correlation Model (PM), Polytomous Rasch Model (PRM), Rating Scale Model (RSM), Partial Credit Model (PCM), Generalized Partial Credit Model (GPCM), Graded Response Model (GRM), Nominal Response Model (NRM). The usefulness of the proposed approach is presented on the example of the analysis of a survey’s results on measuring the quality of professional life of inhabitants of selected communes in Poland. The obtained results indicate that the choice of the fuzzy conversion scale has a large impact on the closeness coefficient values. A large difference was also observed in the spreads of triangular fuzzy numbers between scales based on IRT models and those used in the literature on the subject. The use of the fuzzy TOPSIS with fuzzy conversion scales built based on PRM, RSM, PCM, GPCM, and GRM models gives results with a greater range of variability than in the case of fuzzy conversion scales used in empirical research.

  • Research Article
  • Cite Count Icon 56
  • 10.3389/feduc.2021.721963
Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
  • Sep 17, 2021
  • Frontiers in Education
  • Shenghai Dai + 6 more

The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.

  • Research Article
  • Cite Count Icon 29
  • 10.1037/a0036430
Psychometric properties for the Balanced Inventory of Desirable Responding: Dichotomous versus polytomous conventional and IRT scoring.
  • Jan 1, 2014
  • Psychological Assessment
  • Walter P Vispoel + 1 more

[Correction Notice: An Erratum for this article was reported in Vol 26(3) of Psychological Assessment (see record 2014-16017-001). The mean, standard deviation and alpha coefficient originally reported in Table 1 should be 74.317, 10.214 and .802, respectively. The validity coefficients in the last column of Table 4 are affected as well. Correcting this error did not change the substantive interpretations of the results, but did increase the mean, standard deviation, alpha coefficient, and validity coefficients reported for the Honesty subscale in the text and in Tables 1 and 4. The corrected versions of Tables 1 and Table 4 are shown in the erratum.] Item response theory (IRT) models were applied to dichotomous and polytomous scoring of the Self-Deceptive Enhancement and Impression Management subscales of the Balanced Inventory of Desirable Responding (Paulhus, 1991, 1999). Two dichotomous scoring methods reflecting exaggerated endorsement and exaggerated denial of socially desirable behaviors were examined. The 1- and 2-parameter logistic models (1PLM, 2PLM, respectively) were applied to dichotomous responses, and the partial credit model (PCM) and graded response model (GRM) were applied to polytomous responses. For both subscales, the 2PLM fit dichotomous responses better than did the 1PLM, and the GRM fit polytomous responses better than did the PCM. Polytomous GRM and raw scores for both subscales yielded higher test-retest and convergent validity coefficients than did PCM, 1PLM, 2PLM, and dichotomous raw scores. Information plots showed that the GRM provided consistently high measurement precision that was superior to that of all other IRT models over the full range of both construct continuums. Dichotomous scores reflecting exaggerated endorsement of socially desirable behaviors provided noticeably weak precision at low levels of the construct continuums, calling into question the use of such scores for detecting instances of "faking bad." Dichotomous models reflecting exaggerated denial of the same behaviors yielded much better precision at low levels of the constructs, but it was still less precision than that of the GRM. These results support polytomous over dichotomous scoring in general, alternative dichotomous scoring for detecting faking bad, and extension of GRM scoring to situations in which IRT offers additional practical advantages over classical test theory (adaptive testing, equating, linking, scaling, detecting differential item functioning, and so forth).

  • Research Article
  • Cite Count Icon 6
  • 10.21031/epod.43804
The Development of IRT Based Attitude Scale towards Educational Measurement Course
  • Jun 30, 2016
  • Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi
  • R Nükhet Demi̇rtaşli + 2 more

In this study, the Scale of Attitude towards Educational Measurement and Evaluation (SAEM) developed by Demirtasli (2002) is reconstructed based on polytomous Item Response Theory (IRT) models and its psychometric features are identified. In this context, the best polythomous IRT model was investigated which is fitted SAEM data. IRT models gives invariant person and item parameters, when data-model fit. A version of SAEM has 41 Likert type items with four points was administered to 519 teacher candidates attending teacher education programs at several universities in Turkey. The data were analyzed according to polythomous IRT models: Samejima’s graded response model (S-GRM), the partial credit model (PCM) and a nominal response model (NRM). The results of the analysis showed that a new version of SAEM, which is based on S-GRM, consists of 33 items, has lower chi-square value than the other models and the classic internal reliability was found to be 0.97. The findings of the study indicate that the validity and reliability features of the scale are fairly good.

  • Research Article
  • 10.37680/scaffolding.v7i2.7281
Comparison of GRM and GPCM in the Development of Higher Education Practice Assessment Instruments
  • Jul 19, 2025
  • Scaffolding: Jurnal Pendidikan Islam dan Multikulturalisme
  • Siti Maimunah + 4 more

This study aims to outline various findings of previous research related to the comparison of the application of the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM) in the development of higher education practice assessment instruments. This study uses the Systematic Literature Review. The data in this study are articles indexed in Q1, Q2, Q3, and Q4 from Scopus. Articles were selected using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) technique. After going through the identification, screening, and eligibility process, 35 articles were included in the inclusion stage and analyzed using meta-synthesis techniques. The results of this study show that the findings of previous research show that the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM) have differences in the development of practice assessment instruments in higher education. That GRM measures competencies based on students' values, attitudes, and spirituality, especially in assessments that use a graded scale such as Likert. In contrast, GPCM provides higher reliability in the context of step-based practice assessment or procedural stages. The results of this study can contribute positively to the development of practice assistance in higher education.

  • Research Article
  • Cite Count Icon 6
  • 10.3724/sp.j.1041.2008.00618
Item Selection Strategies for Computerized Adaptive Testing with the Generalized Partial Credit Model
  • Oct 28, 2008
  • Acta Psychologica Sinica
  • Zhen Liu

The objective of computerized adaptive testing(CAT)is to construct an optimal test for each examinee.Item Selection Strategy(ISS)is an important part of CAT research,whose quality is directly related to the reliability,efficiency,and security of the test.Many researches and applications of CAT are based on a dichotomously scored model.It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model.Moreover,it is necessary for us to further explore CAT research based on a polytomously scored model.Both the Generalized Partial Credit Model(GPCM)and the Graded Response Model(GRM)are within the range of a polytomously scored model.However,they differ from each other.In the GRM,the item grade difficulties ascend monotonously as the grades increase;while the GPCM shows the performing process of the item,which is separated into some line-steps to put forwards.In the GPCM,each item contains several step parameters,and there are no specific rules governing them.The posterior step cannot advance when the earlier step has not been completed,and the posterior's step parameter may be lower than that of the previous one.Considerable research is already being conducted on CAT using the GRM;however,in our country,there are few reports pertaining to research on CAT using the GPCM.This study investigated the four types of ISS in comparison with CAT in various circumstances,using the GPCM through computer simulated programs.They are implemented in four item pools,and each item pool has a capacity of 1000 items.Each item has five step parameters;further,the discrimination parameter and step parameters are distributed as follows:b~N(0,1),lna~N(0,1),b~N(0,1),a~U(0.2,2.5),b~U(-3,3),lna~N(0,1),b~U(-3,3),and a~U(0.2,2.5).Item parameters are generated based on the Monte Carlo simulation method.Responses to the items are generated according to the GPCM for a sample of 3000 simulatees θ~N(0,1)whose trait level was also generated using the Monte Carlo simulation method in some types of ISS.During the course of responses,the simulatees' ability is estimated based on the response obtained.In addition,after the four item pools are sorted by the discrimination parameter to complete the a-stratified design,the abovementioned process is performed repeatedly.Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements:precision,ISS steady,item used even,average use of item per person,χ2,efficiency,and item overlap.The data in tables 1 and 2 include both the index values used for evaluation(which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design)and values that are calculated after summing the weight of every index value.We can draw the following conclusions from the data in the tables:all the ability estimates are highly accurate and have fewer differences.Moreover,we compare the value by summing every means weight,we learn that the item step parameter distribution greatly influences the choices of ISS.On the condition that the examinee's trait level follows normal distribution,the application results of the ISS and the item step parameter distribution share a very close relationship.(1)If the item's step parameters follow a normal distribution,the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others.(2)If the item's step parameters follow a uniform distribution,the efficiency of the item selection strategy for the item's average step parameter matching the trait level is much better than that for others.

  • Research Article
  • 10.21831/pep.v29i1.84659
Psychometric properties of the general conspiracy belief scale using item response theory
  • Apr 26, 2025
  • Jurnal Penelitian dan Evaluasi Pendidikan
  • Sumin Sumin + 2 more

Evaluation of the psychometric properties of conspiracy theory belief instruments has been dominated by classical approaches with limitations, especially in dependence on sample size and inaccuracies in item-level analysis. This study aims to fill this gap by applying a polytomous Item Response Theory (IRT) approach to reanalyze the General Conspiracy Belief Scale (GCBS). This study aims to re-examine the psychometric properties of the GCBS with an IRT approach to produce measurements that are more precise and independent of sample characteristics. The research design used was a quantitative replication utilizing secondary data from 2,495 students at the college level. The instrument used consisted of 15 items on a five-category Likert scale. The analysis was conducted using three polynomial IRT models, namely the Graded Response Model (GRM), Partial Credit Model (PCM), and Generalized Partial Credit Model (GPCM), with the help of R software. The results showed that the GRM model was the model that best fit the data, with most items showing high distinctiveness and providing maximum information on respondents with low to moderate levels of conspiratorial belief. Empirical marginal reliability coefficients were high, indicating that the instrument's internal consistency was perfect. This study contributes to the field by offering a more robust and nuanced psychometric evaluation of the GCBS through IRT, providing researchers with a validated framework for assessing conspiracy beliefs with higher accuracy and scale precision. However, the limitation of this study lies in the use of secondary data sourced from one particular population group, so the generalizability of the findings still needs to be further examined in a more diverse context.

  • Research Article
  • Cite Count Icon 6
  • 10.1097/j.pain.0000000000003078
Numeric rating scale for pain should be used in an ordinal but not interval manner. A retrospective analysis of 346,892 patient reports of the quality improvement in postoperative pain treatment registry.
  • Oct 18, 2023
  • Pain
  • Marko Stijic + 3 more

To assess postoperative pain intensity in adults, the numeric rating scale (NRS) is used. This scale has shown acceptable psychometric features, although its scale properties need further examination. We aimed to evaluate scale properties of the NRS using an item response theory (IRT) approach. Data from an international postoperative pain registry (QUIPS) was analyzed retrospectively. Overall, 346,892 adult patients (age groups: 18-20 years: 1.6%, 21-30 years: 6.7%, 31-40 years: 8.3%, 41-50 years: 13.2%, 51-60 years: 17.1%, 61-70 years: 17.3%, 71-80 years: 16.4%, 81-90 years: 3.9%, >90: 0.2%) were included. Among the patients, 55.7% are female and 38% had preoperative pain. Three pain items (movement pain, worst pain, least pain) were analyzed using 4 different IRT models: partial credit model (PCM), generalized partial credit model (GPCM), rating scale model (RSM), and graded response model (GRM). Fit indices were compared to decide the best fitting model (lower fit indices indicate a better model fit). Subgroup analyses were done for sex and age groups. After collapsing the highest and the second highest response category, the GRM outperformed other models (lowest Bayesian information criterion) in all subgroups. Overlapping categories were found in category boundary curves for worst and minimum pain and particularly for higher pain ratings. Response category widths differed depending on pain intensity. For female, male, and age groups, similar results were obtained. Response categories on the NRS are ordered but have different widths. The interval scale properties of the NRS should be questioned. In dealing with missing linearity in pain intensity ratings using the NRS, IRT methods may be helpful.

  • Research Article
  • Cite Count Icon 13
  • 10.1177/00131644221116292
Detecting Rating Scale Malfunctioning With the Partial Credit Model and Generalized Partial Credit Model.
  • Aug 12, 2022
  • Educational and psychological measurement
  • Stefanie A Wind

Rating scale analysis techniques provide researchers with practical tools for examining the degree to which ordinal rating scales (e.g., Likert-type scales or performance assessment rating scales) function in psychometrically useful ways. When rating scales function as expected, researchers can interpret ratings in the intended direction (i.e., lower ratings mean "less" of a construct than higher ratings), distinguish between categories in the scale (i.e., each category reflects a unique level of the construct), and compare ratings across elements of the measurement instrument, such as individual items. Although researchers have used these techniques in a variety of contexts, studies are limited that systematically explore their sensitivity to problematic rating scale characteristics (i.e., "rating scale malfunctioning"). I used a real data analysis and a simulation study to systematically explore the sensitivity of rating scale analysis techniques based on two popular polytomous item response theory (IRT) models: the partial credit model (PCM) and the generalized partial credit model (GPCM). Overall, results indicated that both models provide valuable information about rating scale threshold ordering and precision that can help researchers understand how their rating scales are functioning and identify areas for further investigation or revision. However, there were some differences between models in their sensitivity to rating scale malfunctioning in certain conditions. Implications for research and practice are discussed.

  • Research Article
  • 10.15408/jp3i.v13i1.36745
Modeling of Colorado Learning Attitude Science Survey in Indonesian Version: A Study with Applying Item Response Theory
  • May 30, 2024
  • JP3I (Jurnal Pengukuran Psikologi dan Pendidikan Indonesia)
  • Mutmainna Mutmainna + 4 more

Colorado Learning Attitudes about Science Survey (CLASS) is an instrument designed to explore students' perceptions of physics and assess how closely their beliefs correspond with those of professional physicists. Before the development of CLASS, several similar instruments were developed in the field of Physics Education such as the Maryland Physics Expectation (MPEX), Views About Science Survey (VASS), and Epistemological Beliefs Assessment for Physical Science (EBAPS). Adams et al. developed CLASS in 2006 by evaluating these three instruments. Since then, CLASS has been extensively studied for its use in research, especially in the field of Physics Education, and has also been applied in other fields and translated into several languages. As a form of community strengthening, this article attempts to report the research findings related to the use of the CLASS instrument that has been translated into Indonesian. A total of 292 undergraduate students were sampled in this study, who are students from four universities. The respondents in this study were students who had enrolled in the Fundamental of Physics course. The data obtained were analysed with Item Response Theory (IRT) for the polytomous scale. There are Grade Response Model (GRM), Partial Credit Model (PCM), Rating Scale Model (RSM), and Generalized Partial Credit Model (GPCM). The research results show that among the four models of approach and based on the criteria used, the model considered most suitable is GRM. The research also shows that the number of items declared consistent with the model does not cover all CLASS items but rather some items. This finding indicates that further exploration is needed regarding the CLASS instrument items, especially in the Indonesian version. The findings of this study also add to the wealth of knowledge related to the quality assessment of the CLASS instrument through the modern test theory approach (IRT). Thus, the CLASS instrument can be regarded as a standard instrument and can be used globally across various populations.

  • Research Article
  • Cite Count Icon 1499
  • 10.1177/014662169201600206
A Generalized Partial Credit Model: Application of an EM Algorithm
  • Jun 1, 1992
  • Applied Psychological Measurement
  • Eiji Muraki

The partial credit model (PCM) with a varying slope parameter is developed and called the generalized partial credit model (GPCM). The item step parameter of this model is decomposed to a location and a threshold parameter, following Andrich's (1978) rating scale formulation. The EM algorithm for estimating the model parameters is derived. The performance of this generalized model is compared on both simulated and real data to a Rasch family of polytomous item response models. Simulated data were generated and then analyzed by the various polytomous item response models. The results demonstrate that the rating formulation of the GPCM is quite adaptable to the analysis of polytomous item responses. The real data used in this study consisted of the National Assessment of Educational Progress (Johnson & Allen, 1992) mathematics data that used both dichotomous and polytomous items. The PCM was applied to these data using both constant and varying slope parameters. The GPCM, which provides for varying slope parameters, yielded better fit to the data than did the PCM.

  • Research Article
  • Cite Count Icon 12
  • 10.1002/j.2333-8504.1997.tb01727.x
CONCURRENT CALIBRATION OF DICHOTOMOUSLY AND POLYTOMOUSLY SCORED TOEFL ITEMS USING IRT MODELS
  • Jun 1, 1997
  • ETS Research Report Series
  • K Linda Tang + 1 more

ABSTRACTIn order to meet the needs of the Test of English as a Foreign Language (TOEFL®) constituencies, the TOEFL program is sponsoring a development project known as TOEFL 2000. Drawing from current linguistic theory and models of communicative competence, it is anticipated that the new test or test battery developed by the TOEFL 2000 project will likely be designed to test all four language skills — reading, writing, listening, and speaking — in an integrated fashion. However, one compromise level or position on integration of skills is one in which reading and writing would be tested together, and listening and speaking also tested together. It is also assumed that the test will largely be performance‐based, meaning a substantial portion of the items on the test will likely be constructed‐response items, and an examinee's score on such items will be in one of multiple ordered categories.Two groups of item response theory (IRT) models have been developed to calibrate items with multiple ordered categories (i.e., polytomously scored items): (a) the partial credit model (Masters, 1982) and the generalized partial credit model (Muraki, 1992); and (b) the graded response model (Samejima, 1969,1972). These models have been used jointly with the dichotomous three parameter logistic (3PL) IRT model to concurrently calibrate dichotomously and polytomously scored items for the National Assessment of Educational Progress (NAEP). However, the performance of these polytomous IRT models and the concurrent calibration of dichotomous and polytomous scored items have not been investigated with data from the TOEFL examinee population.The purpose of this study was to obtain a good understanding of the performance of a combination of dichotomous and polytomous IRT models with TOEFL data. TOEFL Vocabulary and Reading Comprehension and Test of Written English (TWE®) items, and TOEFL Listening Comprehension and Test of Spoken English (TSE®) items were concurrently calibrated using a combination of the generalized partial credit model and the 3PL IRT model. The two sets of combined items were also concurrently calibrated using a combination of the graded response model and the 3PL IRT model.The results of this study indicate that data from a reading/writing combination made up of the TOEFL Vocabulary and Reading Comprehension section and the TWE were reasonably well fit by a combination of the 3PL and generalized partial credit models or 3PL and graded response models. In a similar fashion, data for a listening/speaking combination made up of the TOEFL Listening Comprehension section and selected tasks from the TSE were also reasonably well fit by the 3PL/generalized partial credit and 3PL/graded response model combinations.A variety of comparisons across the generalized partial credit and graded response models seem to indicate some preference for using the generalized partial credit model when PARSCALE is used as the calibration program. The results of this study provide useful information about test construction and item calibration procedures that might later be used for the TOEFL 2000 project.

  • Supplementary Content
  • Cite Count Icon 24
  • 10.1080/10705511.2014.937374
The Partial Credit Model and Generalized Partial Credit Model as Constrained Nominal Response Models, With Applications in Mplus
  • Jan 6, 2015
  • Structural Equation Modeling: A Multidisciplinary Journal
  • Anne Corinne Huggins-Manley + 1 more

The purpose of this article is to demonstrate constraining the nominal response model in Mplus software to calibrate data under the partial credit model (PCM) and generalized partial credit model (GPCM). Currently, many researchers are uncertain if the PCM and GPCM can be estimated within Mplus. Through model constraint commands in Mplus, we demonstrate that both models can be estimated in recent versions of this software. We present an example of this approach with data from 522 respondents on a subset of items from the Math Self-Efficacy Scale (Betz & Hackett, 1983). It is demonstrated that the presented model code is a viable way of estimating the models in Mplus.

  • Research Article
  • Cite Count Icon 5
  • 10.1177/01466216231165302
A Mixed Sequential IRT Model for Mixed-Format Items.
  • Mar 17, 2023
  • Applied psychological measurement
  • Junhuan Wei + 2 more

To provide more insight into an individual's response process and cognitive process, this study proposed three mixed sequential item response models (MS-IRMs) for mixed-format items consisting of a mixture of a multiple-choice item and an open-ended item that emphasize a sequential response process and are scored sequentially. Relative to existing polytomous models such as the graded response model (GRM), generalized partial credit model (GPCM), or traditional sequential Rasch model (SRM), the proposed models employ an appropriate processing function for each task to improve conventional polytomous models. Simulation studies were carried out to investigate the performance of the proposed models, and the results indicated that all proposed models outperformed the SRM, GRM, and GPCM in terms of parameter recovery and model fit. An application illustration of the MS-IRMs in comparison with traditional models was demonstrated by using real data from TIMSS 2007.

Save Icon
Up Arrow
Open/Close