On a Hierarchical Rater Model with Ordered Perceptions
This study develops a hierarchical rater model that allows rater discrimination to vary across score categories within items by incorporating ordered perceptual distributions, improving parameter recovery and model fit assessment; empirical results demonstrate its utility for rubric refinement and targeted rater training in constructed-response assessments.
ABSTRACT This study advances hierarchical rater modeling by relaxing the common assumption of equal discrimination across latent classes in constructed-response (CR) scoring. Building on univariate signal detection approaches, we propose a hierarchical model with ordered perceptual distributions, allowing rater discrimination to vary across score categories within an item. Through simulation studies, we evaluate parameter recovery, compare performance with equal-discrimination models, and assess how fit indices can identify the correct model. An empirical application to a financial assessment dataset illustrates practical utility. Diagnostic outputs, including tables and visualizations, demonstrate how the model can inform rubric refinement and support targeted rater training in multi-item CR assessments.
- Research Article
7
- 10.3389/fpsyg.2020.00197
- Feb 14, 2020
- Frontiers in Psychology
The standard item response theory (IRT) model assumption of a single homogenous population may be violated in real data. Mixture extensions of IRT models have been proposed to account for latent heterogeneous populations, but these models are not designed to handle multilevel data structures. Ignoring the multilevel structure is problematic as it results in lower-level units aggregated with higher-level units and yields less accurate results, because of dependencies in the data. Multilevel data structures cause such dependencies between levels but can be modeled in a straightforward way in multilevel mixture IRT models. An important step in the use of multilevel mixture IRT models is the fit of the model to the data. This fit is often determined based on relative fit indices. Previous research on mixture IRT models has shown that performances of these indices and classification accuracy of these models can be affected by several factors including percentage of class-variant items, number of items, magnitude and size of clusters, and mixing proportions of latent classes. As yet, no studies appear to have been reported examining these issues for multilevel extensions of mixture IRT models. The current study aims to investigate the effects of several features of the data on the accuracy of model selection and parameter recovery. Results are reported on a simulation study designed to examine the following features of the data: percentages of class-variant items (30, 60, and 90%), numbers of latent classes in the data (with from 1 to 3 latent classes at level 1 and 1 and 2 latent classes at level 2), numbers of items (10, 30, and 50), numbers of clusters (50 and 100), cluster size (10 and 50), and mixing proportions [equal (0.5 and 0.5) vs. non-equal (0.25 and 0.75)]. Simulation results indicated that multilevel mixture IRT models resulted in less accurate estimates when the number of clusters and the cluster size were small. In addition, mean Root mean square error (RMSE) values increased as the percentage of class-variant items increased and parameters were recovered more accurately under the 30% class-variant item conditions. Mixing proportion type (i.e., equal vs. unequal latent class sizes) and numbers of items (10, 30, and 50), however, did not show any clear pattern. Sample size dependent fit indices BIC, CAIC, and SABIC performed poorly for the smaller level-1 sample size. For the remaining conditions, the SABIC index performed better than other fit indices.
- Research Article
18
- 10.1002/j.2333-8504.2010.tb02215.x
- Jun 1, 2010
- ETS Research Report Series
ABSTRACTA basic consideration in large‐scale assessments that use constructed response (CR) items, such as essays, is how to allocate the essays to the raters that score them. Designs that are used in practice are incomplete, in that each essay is scored by only a subset of the raters, and also unbalanced, in that the number of essays scored by each rater differs across the raters. In addition, all of the possible rater pairs may not be used. The present study examines the effects of these factors on parameter recovery and classification accuracy using simulations of a latent class model based on signal detection theory (SDT). Many tests also include more than one CR item, which introduces a nested or hierarchical structure into the design, in that raters are nested within essays (i.e., there are multiple raters per essay) and essays are nested within examinees (i.e., each examinee provides two or more essays). A hierarchical rater model (HRM) has previously been developed to recognize the nested structure. A version of the HRM that incorporates a latent class signal detection model in the first level, referred to as the HRM‐SDT model, is presented. Parameter recovery in the HRM‐SDT model is examined in simulations. The model is applied to data from several ETS tests.
- Dissertation
1
- 10.17077/etd.005229
- Dec 1, 2019
In dealing with rater effects, double scoring is a popular method to control the quality of ratings for tests including constructed-response (CR) type of items. Treating individual multiple ratings as independent violates the local independence assumption in item response theory (IRT). The typical way to fit standard IRT models to multiple ratings is to use the linear combination of multiple ratings as item scores, such as sum or average scores. However, these summed or averaged score approaches have limitations because it requires the adjustment of original item score categories and still contains rater effects in item scores. The purpose of this dissertation is to assess the effectiveness of using double ratings over single ratings in standard IRT models when rater effects are present, and to compare the performance of standard and newer IRT models for rater effects and multiple ratings, known to correct rater effects from parameter estimation and preserve the original item score categories. Two simulation studies examined the accuracy of IRT models. As such, the number of ratings and IRT models were considered as main factors in the simulation study. The number of ratings includes single and double ratings. Two IRT models entail the generalized partial credit model (GPCM) and hierarchical rater model (HRM), each representing a standard IRT model and the IRT model for multiple ratings and rater effects. The HRM was used to generate ratings with rater effects. Then the GPCM and HRM were fitted to ratings. All the ratings were generated with the combination of other study factors, including sample size, test length, rater effects, and number of score categories. Results were compared and interpreted relative to baseline conditions, where ratings were generated with no rater effects. The main findings of this dissertation were as follows: (1) using single ratings as item scores in rater effect conditions reduced the accuracy of proficiency estimation in the GPCM; (2) double scoring methods relieved the impact of rater effects on proficiency estimation and improved accuracy in the GPCM; (3) for double ratings, the HRM showed better performance than the GPCM using summed item scores; (4) as more items and larger number of score categories were used, accuracy of proficiency estimation improved, in general.
- Research Article
4
- 10.21449/ijate.1076464
- Sep 30, 2022
- International Journal of Assessment Tools in Education
The aim of this study is to investigate the presence of DIF over the gender variable with the latent class modeling approach. Data were 953 students from the USA who participated in the PISA 2018 8th-grade financial literacy assessment. Latent class analysis (LCA) approach was used to determine the latent classes and the data fit the three-class model better in line with fit indices. To obtain more information about the characteristics of the emerging classes, uniform and non-uniform DIF sources were determined by using the Multiple Indicator Multiple Causes (MIMIC) model. The findings are very important in terms of contributing to the interpretation of latent classes. According to the results, the gender variable is a potential source of DIF for latent class indicators. Gathering unbiased estimates for the measurement and structural parameters, it is important to include direct effects in the classes. Ignoring these effects can lead to incorrect determination of implicit classess. An example of the application of Multiple Indicator Multiple Causes (MIMIC) model showed in a latent class framework with a stepwise approach with this study.
- Research Article
17
- 10.3141/2334-09
- Jan 1, 2013
- Transportation Research Record: Journal of the Transportation Research Board
This paper presents a route choice model of latent class heterogeneous drivers that is based on learning in a real-world experiment. Previous publications have presented findings about personal differences in route-switching aggressiveness among drivers. These differences were described by behavior (or driver) types reflecting aggressiveness in route-switching behavior. Behavior types were predictable from driver demographics and from personality traits, as well as from choice situation characteristics. In addition, the behavior types were significant in predicting route choice behavior. This paper does not use the hierarchical procedure that models behavior type on one level and then models route choice at a second level on the basis of behavior type. Instead, both models are estimated simultaneously. In the estimated latent class choice models, the latent classes represent the behavior types, and the choice model is the route-switching behavior. The models developed in this paper are based on a sample of 20 drivers who made more than 2,000 real-world route choices. The results of the developed models indicate that (a) driver classes exist and appear to be similar to the behavior types identified in earlier publications; (b) latent driver classes depend on driver demographics, personality traits, and choice situation characteristics; (c) different driver classes follow different route choice strategies; and (d) incorporating behavior types or latent classes improves route choice model performance, but latent class models perform better than hierarchical behavior-type models.
- Research Article
6
- 10.1177/00131644231180529
- Jun 26, 2023
- Educational and psychological measurement
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's information criterion (DIC), sample size adjusted BIC (SABIC), relative entropy, the integrated classification likelihood criterion (ICL-BIC), the adjusted Lo-Mendell-Rubin (LMR), and Vuong-Lo-Mendell-Rubin (VLMR). The accuracy of the fit indices was assessed for correct detection of the number of latent classes for different simulation conditions including sample size (2,500 and 5,000), test length (15, 30, and 45), mixture proportions (equal and unequal), number of latent classes (2, 3, and 4), and latent class separation (no-separation and small separation). Simulation study results indicated that as the number of examinees or number of items increased, correct identification rates also increased for most of the indices. Correct identification rates by the different fit indices, however, decreased as the number of estimated latent classes or parameters (i.e., model complexity) increased. Results were good for BIC, CAIC, DIC, SABIC, ICL-BIC, LMR, and VLMR, and the relative entropy index tended to select correct models most of the time. Consistent with previous studies, AIC and AICc showed poor performance. Most of these indices had limited utility for three-class and four-class mixture 3PL model conditions.
- Research Article
27
- 10.5664/jcsm.6666
- Jul 15, 2017
- Journal of Clinical Sleep Medicine
This study examined empirically derived symptom cluster profiles among patients who present with insomnia using clinical data and polysomnography. Latent profile analysis was used to identify symptom cluster profiles of 175 individuals (63% female) with insomnia disorder based on total scores on validated self-report instruments of daytime and nighttime symptoms (Insomnia Severity Index, Glasgow Sleep Effort Scale, Fatigue Severity Scale, Beliefs and Attitudes about Sleep, Epworth Sleepiness Scale, Pre-Sleep Arousal Scale), mean values from a 7-day sleep diary (sleep onset latency, wake after sleep onset, and sleep efficiency), and total sleep time derived from an in-laboratory PSG. The best-fitting model had three symptom cluster profiles: "High Subjective Wakefulness" (HSW), "Mild Insomnia" (MI) and "Insomnia-Related Distress" (IRD). The HSW symptom cluster profile (26.3% of the sample) reported high wake after sleep onset, high sleep onset latency, and low sleep efficiency. Despite relatively comparable PSG-derived total sleep time, they reported greater levels of daytime sleepiness. The MI symptom cluster profile (45.1%) reported the least disturbance in the sleep diary and questionnaires and had the highest sleep efficiency. The IRD symptom cluster profile (28.6%) reported the highest mean scores on the insomnia-related distress measures (eg, sleep effort and arousal) and waking correlates (fatigue). Covariates associated with symptom cluster membership were older age for the HSW profile, greater obstructive sleep apnea severity for the MI profile, and, when adjusting for obstructive sleep apnea severity, being overweight/obese for the IRD profile. The heterogeneous nature of insomnia disorder is captured by this data-driven approach to identify symptom cluster profiles. The adaptation of a symptom cluster-based approach could guide tailored patient-centered management of patients presenting with insomnia, and enhance patient care.
- Research Article
8
- 10.1002/1097-0258(20000730)19:14<1881::aid-sim495>3.0.co;2-i
- Jan 1, 2000
- Statistics in Medicine
Rudas, Clogg and Lindsay proposed a new index of fit for contingency table analysis. Using the two-component mixture, where the first component with weight (1-w) represents the model to be tested and the second component with weight w is unstructured, the RCL index of lack of fit was defined to be the smallest mixing weight w(*) being compatible with the two-component mixture to be saturated. This index of fit, which is not sensitive to sample size, is applied to the problem of assessing agreement between two raters whereby three hypotheses (pure agreement, quasi-independence, independence) are considered. As quasi-independence comprises the two other hypotheses, a natural generalization of the RCL index of fit results from assuming the model itself to be composed of two submodels (pure agreement, independence) while the third component remains unstructured. Two examples demonstrate the application of this generalized RCL index of fit, with the first 3x3 contingency table having no empty cells, and the second 4x4 table having five of them. In both cases estimating the parameters and determining w(*) was possible without problems within the linear logistic framework for latent class analysis. A further analysis of the 4x4 table assumes the model to be the two-class latent class model that certainly does not belong to the family of standard models for contingency table analysis. Thus, in contrast to the recommendations given originally, the following conclusions seem to be justified: (i) the number of components representing the model may exceed one (generalized RCL index); (ii) the application of the original RCL index of fit may be extended to more complex models; and (iii) empty cells bear no problems so that this approach may be recommended in the case of both large and small sample sizes.
- Abstract
- 10.1136/sextrans-2011-050108.347
- Jul 1, 2011
- Sexually Transmitted Infections
BackgroundSexually transmitted and bloodborne infection (STBBI) risk is multifaceted and can involve a complex interplay between sexual behaviours, substance abuse and mental health conditions. In Winnipeg, Manitoba Canada we conducted...
- Research Article
11
- 10.1186/s12889-024-19948-y
- Sep 13, 2024
- BMC Public Health
BackgroundDeep-rooted racial residential segregation and housing discrimination have given rise to housing disparities among low-income Black young adults in the US. Most studies have focused on single dimensions of housing instability, and thus provide a partial view of how Black young adults experience multiple, and perhaps overlapping, experiences of housing instability including homelessness, frequent moves, unaffordability, or evictions. We aimed to illuminate the multiple forms of housing instability that Black young adults contend with and examine relationships between housing instability and mental health outcomes.MethodsUsing baseline data from the Black Economic Equity Movement (BEEM) guaranteed income trial with 300 urban low-income Black young adults (aged 18–24), we conducted a three-stage latent class analysis using nine housing instability indicators. We identified distinct patterns by using fit indices and theory to determine the optimal number of latent classes. We then used multinomial logistic regression to identify subpopulations disproportionately represented within unstable housing patterns. Finally, we estimated associations between housing experience patterns and mental health outcomes: depression, anxiety, and hope.ResultsWe found high prevalence of housing instability with 27.3% of participants reporting experiences of homelessness in the prior year and 39.0% of participants reporting multiple measures of housing instability. We found the 4-class solution to be the best fitting model for the data based on fit indices and theory. Latent classes were characterized as four housing experience patterns: 1) more stably housed, 2) unaffordable and overcrowded housing, 3) mainly unhoused, and 4) multiple dimensions of housing instability. Those experiencing unaffordable and overcrowded housing and being mainly unhoused were more than four times as likely to have symptoms of depression (Unaffordable: aOR = 4.57, 95% CI: 1.64, 12.72; Unhoused: aOR = 4.67, 95% CI:1.18, 18.48) and more than twice as likely to report anxiety (Unaffordable: aOR = 2.28, 95% CI: 1.03, 5.04; Unhoused: aOR = 3.36, 95% CI: 1.12, 10.05) compared to the more stably housed pattern. We found that hope scores were similarly high across patterns.ConclusionsHigh prevalence of housing instability and mental health challenges among low-income Black young adults demands tailored interventions to reduce instability, given widening racial disparities and implications for future well-being into adulthood.
- Research Article
9
- 10.1177/2319510x13483513
- Mar 1, 2013
- Asia-Pacific Journal of Management Research and Innovation
The article attempts to empirically test a multi-dimensional and multi-level hierarchical structure of service quality in Life Insurance services in India. The study draws evidence from India to develop and compare a second-order hierarchical model with a first-order model to draw better insight into the determinants and structure of perceived service quality in Indian Life Insurance services. Five component dimensions of perceived service quality were extracted through exploratory factor analysis from a list of initially generated 38 items of service quality from literature and expert review. The five dimensional structure was then tested through confirmatory factor analysis using first-order and second-order reflective model to determine the best model of perceived service quality. The second-order reflective model was found to be of better fit based on indices of fit using AMOS ver 4.0 software. The results showed that both the first-order and hierarchical second-order models are of excellent fit after some modification based on modification indices. The second-order model was however accepted for interpretation of results since it has relatively better fit (significantly lower chi-square value and better fit indices values) and is a more parsimonious model. The results thus provide support for a multi-dimensional and multi-level hierarchical structure of service quality as suggested by Brady and Cronin ( 2001 ), third in Life Insurance services in India. The results show that perceived service quality of Life insurance services is a multi-dimensional second-order construct consisting of the primary dimensions of Service Delivery, Sales Agent Quality, Tangibles, Value and Core Service.
- Preprint Article
- 10.20944/preprints202406.0787.v1
- Jun 12, 2024
- Preprints.org
Objective: To employ Latent Class Analysis (LCA) to investigate dietary intake patterns among Sudanese children aged 0 to 2 years and to examine the association of these patterns with sociodemographic factors. Methods: This study leveraged the Sudan Multiple Indicator Cluster Survey (MICS) 2014 data to uncover dietary intake patterns among 7,362 children using latent class analysis (LCA). We investigated class memberships concerning demographic and socioeconomic factors. The model&#039;s adequacy was determined using several fit indices, including BIC, AIC, entropy, CAIC, and SABIC, providing a holistic evaluation of the model&#039;s accuracy in capturing dietary behaviors. Results: Three latent classes were identified: Class 1 (55%) with an average nutrition composition, Class 2 (28%) with limited nutrition composition, and Class 3 (17%) with good nutrition composition. Significant associations were found between latent class membership and sociodemographic factors, particularly mother&#039;s education level and household wealth. The three-class solution provided the best balance between model fit and class distinction. Conclusions: The LCA revealed distinct dietary intake patterns and underscored the influence of sociodemographic factors on child nutrition. The findings suggest that targeted nutritional interventions should be developed according to the specific needs of different latent classes. The study also highlights the utility of LCA as a robust statistical and machine learning tool in public health research, capable of informing tailored interventions and policies for improving child nutrition. Implications: The study emphasizes the importance of maternal education and socio-economic status in shaping dietary behaviors of children in Sudan. It implies the need for policies that address educational disparities, food security, and economic development as part of comprehensive nutritional interventions.
- Research Article
29
- 10.1016/j.alcr.2019.04.018
- Apr 27, 2019
- Advances in Life Course Research
Identification of developmental trajectory classes: Comparing three latent class methods using simulated and real data.
- Research Article
9
- 10.3389/fpsyt.2020.573410
- Nov 9, 2020
- Frontiers in Psychiatry
Past research documents the heterogeneity in US immigrants, particularly in terms of racial and ethnic categories and specific ethnic subgroups. The present study builds on this research foundation by investigating heterogeneity in immigrants' experiences of adversity, both recent and during childhood, and associations with mental disorders. Data are drawn from 6,131 adult immigrants in the 2012–2013 National Epidemiologic Survey on Alcohol and Related Conditions-III. Prevalence estimates for mental disorders and adversities were calculated overall and by gender. Latent class analysis was utilized to characterize patterns of self-reported experiences of childhood and recent adversities, and multinomial logistic regression established the statistical association between latent class membership and past-year mental disorder outcomes (substance use disorder only, mood/anxiety/trauma disorder only, co-occurring disorder, or no mental disorder). Neglect was the most commonly-reported childhood adversity among immigrant men and women. Prevalence of meeting criteria for a substance use disorder only, or a mood/anxiety/trauma disorder only, varied between men and women, yet no gender differences were observed in prevalence of co-occurring disorders. For latent class analyses, a five-class solution was selected based on fit indices and parsimony. Approximately 10.0% of the sample was categorized in the latent class characterized by severe childhood adversities, while 57.5% was classified in the latent class with low probabilities of reported adversities. The relative risk of meeting criteria for a past-year substance use disorder only (compared to no substance use or mood/anxiety/trauma disorder) was more than three times as high for members of the class with severe childhood adversities (RRR, 3.26; 95% CI, 2.08–5.10), as well as the class with recent employment/financial adversities (RRR, 3.82; 95% CI, 2.36–6.19), compared to the class with low adversities. The relative risk of past-year co-occurring disorders (compared to no disorder) was more than 12 times as high for those in the severe childhood adversities class (RRR, 12.21; 95% CI, 7.06–21.10), compared to the class with low adversities. Findings underscore the importance of considering both recent and childhood adversities when assessing and providing services for US immigrant groups.
- Research Article
72
- 10.1177/0013164494054002020
- Jun 1, 1994
- Educational and Psychological Measurement
The Armed Services Vocational Aptitude Battery (ASVAB) has been used in its current item and content form for more than a decade. Its latent structure, although explored in factor analyses, has never been confirmed. Several confirmatory factor analyses were conducted on Form 8a in a nationally representative sample. These included a g-only model, a three-factor hierarchical Vernon-like model, 2 four-factor first-order models, and 2 four-factor hierarchical models. Based on fit indexes, simple structure, and parsimony in parameter estimation, the three-factor hierarchical model was chosen to represent the data. The higher-order factor was psychometric g, and the first-order factors were interpreted as Speed, VerbalUMath, and Technical Knowledge. The latter two factors were similar to the Vernon factors of Verbal/Educational and Practical.