Can the Generalized Graded Unfolding Model Fit Dominance Responses?
Theoretically, the generalized graded unfolding model (GGUM) is more flexible than the generalized partial credit model (GPCM), a dominance model. For item responses generated by the GPCM, the GGUM estimations can generate overlapping item response curves with those from the GPCM over a range of latent trait scores covering almost all of the population. The discrimination and category threshold estimates from the two models are approximately equal. It is necessary to use an informative prior around an extreme location (e.g., 4 for a positive GPCM item) or fix the extreme locations in the GGUM estimation of GPCM items to achieve the desired estimation. The simulation study and the applications on two real datasets support the theoretical claims. Various practical implications are discussed, and suggestions for future research are provided.
- Research Article
- 10.1080/10508619.2025.2535033
- Aug 10, 2025
- The International Journal for the Psychology of Religion
When trying to make sense of what is going on in the world and their personal lives, people often refer to scientific and religious explanations. Based on pertinent literature, we introduce a novel self-report instrument that captures five types of how people subjectively conceptualize the science–religion relationship. In addition to examining the psychometric properties of the new instrument, we tested whether these five types of conceptualizations are ordered along a single continuum of conflict vs. compatibility. In doing so, we ran and compared two unidimensional item response theory (IRT) models; a generalized partial credit model (GPCM) and a generalized graded unfolding model (GGUM), both in a German (total N = 2,920) and in a U.S. (N = 1,197) sample. We examined model fit statistics to determine the best-fitting model and examined measurement precision at the scale and the item levels. Finally, we tested measurement invariance across countries (i.e. Germany and the United States), as well as the discriminant, convergent, and criterion-related validity of our new instrument. Our results suggest that public perceptions of the relationship between science and religion are best captured by an unfolding response process (i.e. GGUM) and vary systematically across cultural contexts. In addition, U.S. participants perceived higher levels of conflict than their German counterparts. This research contributes to our theoretical understanding of science-religion perceptions and provides a validated measure for future cross-cultural research.
- Research Article
1
- 10.1080/08957347.2022.2067543
- Apr 3, 2022
- Applied Measurement in Education
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of proficiency estimation of two IRT models (GPCM versus the hierarchical rater model, HRM) for double ratings. The main findings were as follows: (a) rater effects substantially reduced the accuracy of IRT proficiency estimation; (b) double ratings relieved the negative impact of rater effects on proficiency estimation and improved the accuracy relative to single ratings; (c) IRT estimators showed different patterns in the conditional accuracy; (d) as more items and a larger number of score categories were used, the accuracy of proficiency estimation improved; and (e) the HRM consistently showed better performance than the GPCM.
- Research Article
39
- 10.3389/feduc.2021.721963
- Sep 17, 2021
- Frontiers in Education
The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.
- Research Article
5
- 10.1177/00131644221116292
- Aug 12, 2022
- Educational and psychological measurement
Rating scale analysis techniques provide researchers with practical tools for examining the degree to which ordinal rating scales (e.g., Likert-type scales or performance assessment rating scales) function in psychometrically useful ways. When rating scales function as expected, researchers can interpret ratings in the intended direction (i.e., lower ratings mean "less" of a construct than higher ratings), distinguish between categories in the scale (i.e., each category reflects a unique level of the construct), and compare ratings across elements of the measurement instrument, such as individual items. Although researchers have used these techniques in a variety of contexts, studies are limited that systematically explore their sensitivity to problematic rating scale characteristics (i.e., "rating scale malfunctioning"). I used a real data analysis and a simulation study to systematically explore the sensitivity of rating scale analysis techniques based on two popular polytomous item response theory (IRT) models: the partial credit model (PCM) and the generalized partial credit model (GPCM). Overall, results indicated that both models provide valuable information about rating scale threshold ordering and precision that can help researchers understand how their rating scales are functioning and identify areas for further investigation or revision. However, there were some differences between models in their sensitivity to rating scale malfunctioning in certain conditions. Implications for research and practice are discussed.
- Research Article
26
- 10.1002/sim.4475
- Feb 23, 2012
- Statistics in Medicine
The US Food and Drug Administration recently announced the final guidelines on the development and validation of patient-reported outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth, biostatisticians may encounter psychometric methods more frequently, particularly item response theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a generalized partial credit model (GPCM) in IRT. GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then, a worked example is presented, using self-reported responses taken from the international personality item pool. The worked example contains step-by-step guides on using the statistical languages r and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the re-analysis of existing data and in future research.
- Research Article
7
- 10.1002/j.2333-8504.1998.tb01781.x
- Dec 1, 1998
- ETS Research Report Series
ABSTRACTPsychologists have long used binary or graded disagree‐agree responses to measure attitudes. Such data have traditionally been analyzed with cumulative models, but several researchers have recently argued that unfolding models are generally more appropriate. There have been several parametric item response models proposed to unfold disagree‐agree responses. Some of these models allow only for binary responses whereas others permit both binary and graded responses. A new item response model, referred to as the Generalized Graded Unfolding Model (GGUM), is developed in this paper. The GGUM allows for either binary or graded responses and generalizes previous item response models for unfolding in two useful ways. First, it implements a discrimination parameter that varies across items, and thus, items are allowed to discriminate among respondents in different ways. Second, the GGUM allows for distinctively different use of response categories across items. It does this by implementing response category threshold parameters that vary across items. A marginal maximum likelihood algorithm is implemented to estimate GGUM item parameters, whereas person parameters are derived from an expected a posteriori technique. Recovery simulations suggest that accurate item parameter estimates can be obtained with approximately 750 subjects. Additionally, accurate person estimates are derived with approximately 20 6‐category items. The applicability of the GGUM to common attitude testing situations is illustrated with real data on student attitudes toward abortion. Index terms: attitude measurement, unfolding model, item response theory, graded unfolding model, generalized graded unfolding model, Thurstone scale, Likert scale.
- Research Article
2
- 10.3390/sym13020223
- Jan 29, 2021
- Symmetry
The fuzzy TOPSIS (The Technique for Order of Preference by Similarity to Ideal Solution) is an attractive tool for measuring complex phenomena based on uncertain data. The original version of the method assumes that the object assessments in terms of the adopted criteria are expressed as triangular fuzzy numbers. One of the crucial stages of the fuzzy TOPSIS is selecting the fuzzy conversion scale, which is used to evaluate objects in terms of the adopted criteria. The choice of a fuzzy conversion scale may influence the results of the fuzzy TOPSIS. There is no uniform approach in constructing and selecting the fuzzy conversion scale for the fuzzy TOPSIS. The choice is subjective and made by researchers. Therefore, the aim of the article is to present a new, objective approach to the construction of fuzzy conversion scales based on Item Response Theory (IRT) models. The following models were used in the construction of fuzzy conversion scales: Polychoric Correlation Model (PM), Polytomous Rasch Model (PRM), Rating Scale Model (RSM), Partial Credit Model (PCM), Generalized Partial Credit Model (GPCM), Graded Response Model (GRM), Nominal Response Model (NRM). The usefulness of the proposed approach is presented on the example of the analysis of a survey’s results on measuring the quality of professional life of inhabitants of selected communes in Poland. The obtained results indicate that the choice of the fuzzy conversion scale has a large impact on the closeness coefficient values. A large difference was also observed in the spreads of triangular fuzzy numbers between scales based on IRT models and those used in the literature on the subject. The use of the fuzzy TOPSIS with fuzzy conversion scales built based on PRM, RSM, PCM, GPCM, and GRM models gives results with a greater range of variability than in the case of fuzzy conversion scales used in empirical research.
- Research Article
26
- 10.1007/s00198-005-0024-7
- Dec 14, 2005
- Osteoporosis international : a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA
Vertebral deformities are a common consequence of osteoporosis and are known to decrease quality of life. The Qualeffo-41 is a quality-of-life questionnaire especially developed for measuring quality of life in patients with vertebral deformities. It consists of 41 questions arranged in five domains: pain, physical function, social function, general health perception, and mental function. The objectives of this study were: (1) to develop a shorter version of the Qualeffo-41 by removing redundant questions; and (2) to investigate the scale characteristics, reliability, and validity of this shorter version. The study was performed using data from the Qualeffo validation study and the Multiple Outcomes of Raloxifene Evaluation (MORE) study. The analyses were performed in patients with vertebral deformities (n=579). Factor analysis on polychoric correlations and an item response theory (IRT) model, i.e., the generalized partial credit model (GPCM), were used to create a shorter version of Qualeffo-41. Using GPCM, scoring weights were computed for all items. Three items were removed from the data set because of too many missing values. Factor analysis identified three instead of five domains: (1) pain, (2) physical function, and (3) mental function. Five items had factor loadings <0.4 and were not included in the GPCM. After excluding several items, the domains pain (four items), physical function (18 items), and mental function (nine items) showed a good, reasonable, and excellent fit, respectively. This indicates that the mental function domain and the pain domain are more unidimensional than the physical function domain. All three domains showed a very high correlation (r > or =0.95) with the corresponding domains of the Qualeffo-41. Qualeffo-31 was developed, consisting of three domains with a reasonable to excellent fit to the GPCM. Although the fit to the GPCM supports the construct validity of the Qualeffo-31, validation in a new study should be performed before using it in practice.
- Research Article
44
- 10.1016/j.paid.2010.06.019
- Jul 15, 2010
- Personality and Individual Differences
An ideal point account of the JDI Work satisfaction scale
- Research Article
9
- 10.3758/bf03195532
- Nov 1, 2003
- Behavior Research Methods, Instruments, & Computers
The generalized graded unfolding model (GGUM) is an item response theory (IRT) model that implements symmetric, nonmonotonic, single-peaked item characteristic curves. The GGUM is appropriate for measuring individual differences for a variety of psychological constructs, especially attitudes. Like other IRT models, the location and scale (i.e., the metric) of parameter estimates from the GGUM are data dependent. Therefore, parameter estimates from alternative calibrations will generally not be comparable, even when responses to the same items are analyzed. GGUMLINK is a computer program developed to reexpress parameter estimates from two separate GGUM calibrations in a common metric. In this way, the results from separate calibrations of model parameters can be compared. GGUMLINK can secure a common metric by using one of five methods that have recently been generalized to the GGUM. The GGUMLINK executable program is available free and may be downloaded from http://www.education.umd.edu/EDMS.
- Research Article
22
- 10.1016/j.paid.2008.04.001
- May 6, 2008
- Personality and Individual Differences
Parent ratings of the ADHD items of the disruptive behavior rating scale: Analyses of their IRT properties based on the generalized partial credit model
- Research Article
- 10.7160/eriesj.2024.170108
- Mar 31, 2024
- Journal on Efficiency and Responsibility in Education and Science
This study focuses on developing a five-tier chemical diagnostic test based on a computer-based test with 11 assessment categories with an assessment score from 0 to 10. A total of 20 items produced were validated by education experts, material experts, measurement experts, and media experts and obtained an average index of the Aiken test > 0.70. The validation results were tested on 580 respondents and analyzed using the Generalized Partial Credit Model (GPCM) Item Response Theory (IRT) type. The results of the analysis show that all of the items meet the requirements to be said to be valid for the model; the evidence of the value this: RMSEA < 0.08, CFI > 0.87, SRMR < 0.10, GFI > 0.90, NFI > 0.90, NNFI > 0.90, IFI > 0.90, TLI > 0.90, and RFI > 0.90, and all items were obtained has a p.S_X2 value greater than 0.05 which indicates that all items developed are fit and by the GPCM model. The construct reliability (CR) value is 0.99, which suggests the construct is reliable. The most challenging item is item 9, and the most accessible item is item 4
- Research Article
6
- 10.1177/00131644211032261
- Aug 2, 2021
- Educational and Psychological Measurement
The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.
- Research Article
- 10.37680/scaffolding.v7i2.7281
- Jul 19, 2025
- Scaffolding: Jurnal Pendidikan Islam dan Multikulturalisme
This study aims to outline various findings of previous research related to the comparison of the application of the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM) in the development of higher education practice assessment instruments. This study uses the Systematic Literature Review. The data in this study are articles indexed in Q1, Q2, Q3, and Q4 from Scopus. Articles were selected using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) technique. After going through the identification, screening, and eligibility process, 35 articles were included in the inclusion stage and analyzed using meta-synthesis techniques. The results of this study show that the findings of previous research show that the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM) have differences in the development of practice assessment instruments in higher education. That GRM measures competencies based on students' values, attitudes, and spirituality, especially in assessments that use a graded scale such as Likert. In contrast, GPCM provides higher reliability in the context of step-based practice assessment or procedural stages. The results of this study can contribute positively to the development of practice assistance in higher education.
- Research Article
3
- 10.3724/sp.j.1041.2008.00618
- Oct 28, 2008
- Acta Psychologica Sinica
The objective of computerized adaptive testing(CAT)is to construct an optimal test for each examinee.Item Selection Strategy(ISS)is an important part of CAT research,whose quality is directly related to the reliability,efficiency,and security of the test.Many researches and applications of CAT are based on a dichotomously scored model.It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model.Moreover,it is necessary for us to further explore CAT research based on a polytomously scored model.Both the Generalized Partial Credit Model(GPCM)and the Graded Response Model(GRM)are within the range of a polytomously scored model.However,they differ from each other.In the GRM,the item grade difficulties ascend monotonously as the grades increase;while the GPCM shows the performing process of the item,which is separated into some line-steps to put forwards.In the GPCM,each item contains several step parameters,and there are no specific rules governing them.The posterior step cannot advance when the earlier step has not been completed,and the posterior's step parameter may be lower than that of the previous one.Considerable research is already being conducted on CAT using the GRM;however,in our country,there are few reports pertaining to research on CAT using the GPCM.This study investigated the four types of ISS in comparison with CAT in various circumstances,using the GPCM through computer simulated programs.They are implemented in four item pools,and each item pool has a capacity of 1000 items.Each item has five step parameters;further,the discrimination parameter and step parameters are distributed as follows:b~N(0,1),lna~N(0,1),b~N(0,1),a~U(0.2,2.5),b~U(-3,3),lna~N(0,1),b~U(-3,3),and a~U(0.2,2.5).Item parameters are generated based on the Monte Carlo simulation method.Responses to the items are generated according to the GPCM for a sample of 3000 simulatees θ~N(0,1)whose trait level was also generated using the Monte Carlo simulation method in some types of ISS.During the course of responses,the simulatees' ability is estimated based on the response obtained.In addition,after the four item pools are sorted by the discrimination parameter to complete the a-stratified design,the abovementioned process is performed repeatedly.Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements:precision,ISS steady,item used even,average use of item per person,χ2,efficiency,and item overlap.The data in tables 1 and 2 include both the index values used for evaluation(which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design)and values that are calculated after summing the weight of every index value.We can draw the following conclusions from the data in the tables:all the ability estimates are highly accurate and have fewer differences.Moreover,we compare the value by summing every means weight,we learn that the item step parameter distribution greatly influences the choices of ISS.On the condition that the examinee's trait level follows normal distribution,the application results of the ISS and the item step parameter distribution share a very close relationship.(1)If the item's step parameters follow a normal distribution,the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others.(2)If the item's step parameters follow a uniform distribution,the efficiency of the item selection strategy for the item's average step parameter matching the trait level is much better than that for others.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.