Comparing Fit and Reliability Estimates of a Psychological Instrument using Second-Order CFA, Bifactor, and Essentially Tau-Equivalent (Coefficient Alpha) Models via AMOS 22
Estimation of composite reliability within a hierarchical modeling framework has recently become of particular interest given the growing recognition that the underlying assumptions of coefficient alpha are often untenable. Unfortunately, coefficient alpha remains the prominent estimate of reliability when estimating total scores from a scale with a hierarchical structure, in part because there are few published articles that provide a step-by-step demonstration of how to estimate reliability within the context of structural equation modeling. Using AMOS 22 to analyze simulated and Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV) summary data, the authors demonstrate how to compare the fit and reliability estimates of a (a) second-order confirmatory factor analytic (CFA) model, (b) bifactor model, and (c) essentially tau-equivalent model, which conforms to the stringent assumptions underlying coefficient alpha. The variance–covariance matrices generated from the simulated data as well as the WAIS-IV data are provided to allow for replication of results.
- Research Article
460
- 10.1111/j.1467-6494.2011.00749.x
- Jun 29, 2012
- Journal of Personality
Many psychological constructs are conceived to be hierarchically structured and thus to operate at various levels of generality. Alternative confirmatory factor analytic (CFA) models can be used to study various aspects of this proposition: (a) The one-factor model focuses on the top of the hierarchy and contains only a general construct, (b) the first-order factor model focuses on the intermediate level of the hierarchy and contains only specific constructs, and both (c) the higher order factor model and (d) the nested-factor model consider the hierarchy in its entirety and contain both general and specific constructs (e.g., bifactor model). This tutorial considers these CFA models in depth, addressing their psychometric properties, interpretation of general and specific constructs, and implications for model-based score reliabilities. The authors illustrate their arguments with normative data obtained for the Wechsler Adult Intelligence Scale and conclude with recommendations on which CFA model is most appropriate for which research and diagnostic purposes.
- Research Article
1
- 10.2333/bhmk.34.131
- Jul 1, 2007
- Behaviormetrika
We analyzed data from the National Center Test for University Admissions (NCT) administered in January 2005 by applying three different factor analysis (FA) models: an exploratory FA model, a confirmatory FA model, and a hierarchical FA model. The data was collected from 385,494 students and included 17 variables of the 15 principal subjects. Two difficulties were experienced in applying FA models to the NCT data: structural and nonstructural missing data patterns. The structural difficulty is derived from the administration schedule, and the non-structural difficulty is caused by the a la carte system of the NCT. Consequently, very complicated missing data patterns exist in the NCT data. We solved the problems of the missing data patterns by using the pseudomaximum likelihood method and the full-information maximum likelihood method. We extracted two factors by using the exploratory FA model. One factor was for linguistic and social studies, and the other was for mathematics and sciences. These factors were then examined by using the confirmatory FA model. We then confirmed the strong influence of the general factor by using the hierarchical FA model. Furthermore, we performed a multi-group analysis on the confirmatory and hierarchical FA models, focusing on the distinction of sex.
- Research Article
117
- 10.1177/1073191114528029
- Mar 28, 2014
- Assessment
The current article compares the use of exploratory structural equation modeling (ESEM) as an alternative to confirmatory factor analytic (CFA) models in personality research. We compare model fit, factor distinctiveness, and criterion associations of factors derived from ESEM and CFA models. In Sample 1 (n = 336) participants completed the NEO-FFI, the Trait Emotional Intelligence Questionnaire-Short Form, and the Creative Domains Questionnaire. In Sample 2 (n = 425) participants completed the Big Five Inventory and the depression and anxiety scales of the General Health Questionnaire. ESEM models provided better fit than CFA models, but ESEM solutions did not uniformly meet cutoff criteria for model fit. Factor scores derived from ESEM and CFA models correlated highly (.91 to .99), suggesting the additional factor loadings within the ESEM model add little in defining latent factor content. Lastly, criterion associations of each personality factor in CFA and ESEM models were near identical in both inventories. We provide an example of how ESEM and CFA might be used together in improving personality assessment.
- Research Article
3
- 10.2478/mdke-2024-0007
- Jun 1, 2024
- Management Dynamics in the Knowledge Economy
Management researchers often use structural equation modeling to analyze data from questionnaire-based instruments. Usually, confirmatory factor analysis (CFA) is applied to confirm the hypothesized or theorized factor structure of the instrument. Most authors adopt a single CFA model without comparing it against other potentially valid models (general factor, correlated factor model, second-order hierarchical model, and bifactor model). Hence, the dimensionality and reliability of constructs using bifactor modeling to validate latent scores are often ignored. Also, this gap is widened by no unanimous agreement on the use of post hoc modification of CFA models to support fit to the data in covariance-based structural equation modeling (CB-SEM). The objective of the study was to explore model fit, dimensionality, and reliability of the Employee Work Assessment Tool (EWAT) using competing CFA models. The study used a published dataset on the EWAT instrument to illustrate the assessment of the dimensionality and model-based reliability of the tool using CB-SEM. Results showed that CFA statistics of the bifactor model were most adequate for the instrument (χ2=70.053, df=19, RMSEA=0.082 [90% confidence interval; 0.062, 0.103], SRMR=0.036, CFI=0.963). The bifactor model ancillary measures supported the unidimensional structure of EWAT with justification for the use of total scores. The study concludes that the instrument is best described and applied as a unidimensional construct, and therefore, a single score can be used to rate employees’ perceptions of their work conditions. The study presents both practical implications for management researchers and simplified reporting for bifactor modelling.
- Research Article
40
- 10.1037/met0000465
- Dec 1, 2023
- Psychological Methods
Confirmatory factor analysis (CFA) and its bifactor models are popular in empirical investigations of the factor structure of psychological constructs. CFA offers straightforward hypothesis testing but has notable pitfalls, such as the imposition of strict assumptions (i.e., simple structure) that obscure unmodeled complexity. Due to the limitations of bifactor CFAs, they have yielded anomalous results across samples and studies that suggest model misspecification (e.g., evaporating specific factors and unexpected loadings). We propose the use of exploratory factor analysis (EFA) to evaluate the structural validity of CFA solutions-either before or after the estimation of more restrictive CFA models-to (a) identify model misspecifications that may drive anomalous estimates and (b) confirm CFA models by examining whether hypothesized structures emerge with limited researcher input. We evaluated the degree to which predominant factor structures were invariant across contexts along the exploratory-confirmatory continuum and demonstrate how poor methodological choices can distort results and impede theory development. All CFA models fit well, but there were numerous differences in replicability and substantive interpretability. Several similarities emerged between bifactor CFA and EFA models, including evidence of overextraction, the collapse of specific factors onto the general factor, and subsequent shifts in how the general factor was defined. We situate these methodological shortcomings within the broader literature on structural models of psychopathology, articulate implications for theories (such as the p-factor) that are borne out of factor analysis, outline several remedies for problems encountered when performing exploratory bifactor analysis, and propose alternative specifications for confirmatory bifactor models. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
- Research Article
33
- 10.1007/s12671-021-01801-7
- Jan 1, 2022
- Mindfulness
ObjectivesEmpirical research investigating self-compassion is a rapidly developing field, and it is potentially crucial in early adolescence. The primary aim of the present study was to psychometrically evaluate the Persian translation of the Self-Compassion Scale Youth version (SCS-Y) and evaluate its factor structure among young adolescents. The second aim was to explore the buffering effect of self-compassion against the negative effect of difficulties in emotion regulation on COVID-19-related anxiety.MethodsA sample of young students (n = 532; mean age 13.57 years) completed an online survey, which included the SCS-Y, Patient Health Questionnaire, Difficulties In Emotion Regulation Scale, Coronavirus Anxiety Scale, Youth Life Orientation Test, Brief Resilience Scale, and Brief 10-Item Big Five Inventory. First-order (six-factor) confirmatory factor analysis (CFA) and bi-factor exploratory structural equation modeling (ESEM) analysis were used to evaluate the factor structure of the SCS-Y.ResultsResults showed that the SCS-Y had very good internal consistency (Cronbach’s alpha coefficient: 0.88; McDonald’s omega coefficient: 0.90), composite reliability (0.87), and adequate test–retest reliability after 4 weeks (0.60). The first-order (six-factor) CFA and bi-factor ESEM analysis demonstrated the SCS-Y had excellent dimensionality. Further analysis found negative associations between self-compassion with both depression and neuroticism, and positive associations between self-compassion with both resilience and optimism. Moreover, self-compassion moderated the association between emotion dysregulation and anxiety generated by the COVID-19. Overall, the findings indicated that the SCS-Y had acceptable criterion-related validity, convergent validity, and discriminant validity.ConclusionsThe findings provide evidence that the SCS-Y is a reliable and valid instrument for assessing the six factors of self-compassion among younger adolescents. Based on the study’s findings, self-compassion appears to be a protective factor against mental health problems during the COVID-19 pandemic for younger adolescents.
- Research Article
36
- 10.1177/1073191116653471
- Jun 16, 2016
- Assessment
The primary goals of this study were to evaluate the dimensionality of the Penny et al. Sluggish Cognitive Tempo Scale and to compare model fits for parent- and youth self-report versions. Participants were 262 young adolescents (ages 10-15) comprehensively diagnosed with attention-deficit/hyperactivity disorder. Both confirmatory factor analysis (CFA) and bifactor modeling were used to determine if the proposed three-factor structure previously identified through exploratory factor analysis could be confirmed. Results showed that although the three-factor CFA had better fit statistics than a one- or two-factor CFA, the bifactor model was the best-fitting model for both parent report and self-report. This implies that Sluggish Cognitive Tempo Scale is best conceptualized as having an underlying general factor, with three specific factors that may represent different etiologies. Importantly, results also showed low-to-moderate correlations between raters and equivalent or better fit statistics for self-report in comparison with parent report.
- Research Article
- 10.1016/j.psycom.2025.100242
- Mar 1, 2026
- Psychiatry Research Communications
A psychometric assessment of the Patient Health Questionnaire-9 for people living with acutely-treated HIV in Thailand
- Research Article
36
- 10.1093/geront/gnw198
- Jan 12, 2017
- The Gerontologist
To (a) assess the validity and reliability of the 9-item Positive Aspects of Caregiving (PAC) scale among a national sample of caregivers for older adults with functional limitations, (b) develop a shorter version (short-PAC [S-PAC] scale) and assess its psychometric properties, and (c) investigate both scales' measurement equivalence/invariance (ME/I) across language of administration (Chinese/English/Malay). Scale/item measurement property assessment, confirmatory factor analysis (CFA), testing the "original" 2-factor model (6 items: first factor; 3 items: second factor), and exploratory FA (EFA) of the 9-item PAC scale was done. Consequently, alternate CFA models were tested. The S-PAC was developed and subjected to CFA. For both scales, convergent (correlation with caregiver esteem) and divergent (correlation with caregiver depressive symptoms) validity, and language ME/I was assessed. For the 9-item PAC scale, the "original" 2-factor CFA model had a poor fit; its EFA and scale/item measurement properties supported a single factor. Among alternate CFA models, a bi-factor model (all nine items: first factor [overall PAC]; six items: second factor [self-affirmation]; three items: third factor [outlook-on-life]) had the best fit. The bi-factor CFA model also had a good fit for the S-PAC scale, developed after eliminating 2 items from the 9-item PAC scale. Both scales demonstrated convergent and divergent validity, and partial ME/I across language of administration. Both the 9-item PAC and 7-item S-PAC scales can be used to assess positive feelings resulting from care provision among family caregivers of older adults with functional limitations.
- Research Article
12
- 10.3389/fpsyg.2023.1106624
- May 12, 2023
- Frontiers in Psychology
Based on the career theory of Cognitive Information Processing (CIP), we selected scale items from literature reviews and expert guidance. The scale consisted of 28 items with 4 factors (interests, abilities, values, personality). To test the scale’s factor structure, we used confirmatory factor analysis (CFA), and the model was modified according to CFA results. The second-order confirmatory factor analysis was applied to the model of the scale to prove the rationality of the total score. The internal consistency were evaluated using Cronbach’s alpha coefficients. In addition, the composite reliability (CR) and average variance extraction (AVE) of the scale were also calculated to test the convergent validity. After related analyses, the scale was proved to have good psychometric properties, which can be used to measure junior high school students’ career planning level in information technology course from the aspects of interest, ability, values, and personality. The effect of the first-order confirmatory factor analysis model constructed in this study is not ideal. Therefore, on this basis, a second-order confirmatory factor analysis model is constructed in combination with existing literatures, and the rationality of the model is verified through data, which highlights the novelty of this study.
- Research Article
13
- 10.1371/journal.pone.0237778
- Aug 25, 2020
- PLoS ONE
ObjectiveThe Tinnitus Handicap Inventory (THI) is widely used in clinical practice and research as a three-dimensional measure of tinnitus severity. Despite extensive use, its factor structure remains unclear. Furthermore, THI can be considered a reliable measure only if Cronbach’s alpha coefficient and Classical Test Theory is used. The more modern and robust Item Response Theory (IRT) has so far not been used to psychometrically evaluate THI. In theory, IRT allows a more precise evaluation of THI’s factor structure, reliability, and the quality of individual items.MethodThere were 1115 patients with tinnitus (556 women and 559 men), aged 19–84 years (M = 51.55; SD = 13.28).The dimensionality of THI was evaluated using several models of Confirmatory Factor Analysis and an Item Response Theory approach. Exploratory non-parametric Mokken scaling was applied to determine a unidimensional and robust scale. Several IRT polytomous models were used to assess the overall quality of THI.ResultsThe bifactor model had the best fit (RMSEA = 0.055; CFI = 0.976; SRMR = 0.040) and revealed one strong general factor and several weak specific factors. Mokken scaling generated a reliable unidimensional scale (Loevinger’s H = 0.463). In order to refine THI we propose that five items be removed. The IRT Generalized Partial Credit Model generated good parameters in terms of item location (difficulty), discrimination, and information content of items.ConclusionOur findings support the use of THI to evaluate tinnitus severity in terms of it being a reliable unidimensional scale. However, clinicians and researchers should rely only on its overall score, which reflects global tinnitus severity. To improve its psychometric quality, several refinements of THI are proposed.
- Research Article
7
- 10.1007/s10067-017-3649-y
- May 2, 2017
- Clinical Rheumatology
The objective of this study was to test different exploratory solutions to the LupusQoL scale in a sample of Spanish patients with SLE using confirmatory factor analysis (CFA) and Rasch modeling, as well as to estimate the convergent validity. The χ 2 test, RMSEA, CFI, and TLI were used to test the fit of the different exploratory structures with CFA. To estimate the parameters in the dimensions found, a rating scale Rasch multidimensional random coefficient multinomial logit model was used. The reliability of the scores was obtained with coefficient alpha and coefficient omega. The convergent validity was calculated using Spearman's rho. Four hundred and fifty patients participated but complete data were available for 223 subjects. The original version (UK) and the French version obtained the best fit, showing that the proposed original structure was the best solution for the structure of the LupusQoL scale in the Spanish sample. The multidimensional solution of eight dimensions was adequate, but item 8 in physical health, item 16 in intimate relations, and items 29 and 30 obtained mean squares >1.6. Internal consistency and coefficient omega of the scores in the eight domains were higher. The Spanish version of LupusQoL correlated strongly with the corresponding SLAQ, EQ5D analogic scale, and EQ5D domain. This analysis confirmed the structure of eight dimensions of the original version in patients with SLE.
- Research Article
24
- 10.1371/journal.pone.0238110
- Aug 31, 2020
- PLOS ONE
The Defining Issues Test (DIT) aimed to measure one's moral judgment development in terms of moral reasoning. The Neo-Kohlbergian approach, which is an elaboration of Kohlbergian theory, focuses on the continuous development of postconventional moral reasoning, which constitutes the theoretical basis of the DIT. However, very few studies have directly tested the internal structure of the DIT, which would indicate its construct validity. Using the DIT-2, a later revision of the DIT, we examined whether a bi-factor model or 3-factor CFA model showed a better model fit. The Neo-Kohlbergian theory of moral judgment development, which constitutes the theoretical basis for the DIT-2, proposes that moral judgment development occurs continuously and that it can be better explained with a soft-stage model. Given these assertions, we assumed that the bi-factor model, which considers the Schema-General Moral Judgment (SGMJ), might be more consistent with Neo-Kohlbergian theory. We analyzed a large dataset collected from undergraduate students. We performed confirmatory factor analysis (CFA) via weighted least squares. A 3-factor CFA based on the DIT-2 manual and a bi-factor model were compared for model fit. The three factors in the 3-factor CFA were labeled as moral development schemas in Neo-Kohlbergian theory (i.e., personal interests, maintaining norms, and postconventional schemas). The bi-factor model included the SGMJ in addition to the three factors. In general, the bi-factor model showed a better model fit compared with the 3-factor CFA model although both models reported acceptable model fit indices. We found that the DIT-2 scale is a valid measure of the internal structure of moral reasoning development using both CFA and bi-factor models. In addition, we conclude that the soft-stage model, posited by the Neo-Kohlbergian approach to moral judgment development, can be better supported with the bi-factor model that was tested in the present study.
- Research Article
3
- 10.1371/journal.pone.0238110.r006
- Aug 31, 2020
- PLoS ONE
IntroductionThe Defining Issues Test (DIT) aimed to measure one’s moral judgment development in terms of moral reasoning. The Neo-Kohlbergian approach, which is an elaboration of Kohlbergian theory, focuses on the continuous development of postconventional moral reasoning, which constitutes the theoretical basis of the DIT. However, very few studies have directly tested the internal structure of the DIT, which would indicate its construct validity.ObjectivesUsing the DIT-2, a later revision of the DIT, we examined whether a bi-factor model or 3-factor CFA model showed a better model fit. The Neo-Kohlbergian theory of moral judgment development, which constitutes the theoretical basis for the DIT-2, proposes that moral judgment development occurs continuously and that it can be better explained with a soft-stage model. Given these assertions, we assumed that the bi-factor model, which considers the Schema-General Moral Judgment (SGMJ), might be more consistent with Neo-Kohlbergian theory.MethodsWe analyzed a large dataset collected from undergraduate students. We performed confirmatory factor analysis (CFA) via weighted least squares. A 3-factor CFA based on the DIT-2 manual and a bi-factor model were compared for model fit. The three factors in the 3-factor CFA were labeled as moral development schemas in Neo-Kohlbergian theory (i.e., personal interests, maintaining norms, and postconventional schemas). The bi-factor model included the SGMJ in addition to the three factors.ResultsIn general, the bi-factor model showed a better model fit compared with the 3-factor CFA model although both models reported acceptable model fit indices.ConclusionWe found that the DIT-2 scale is a valid measure of the internal structure of moral reasoning development using both CFA and bi-factor models. In addition, we conclude that the soft-stage model, posited by the Neo-Kohlbergian approach to moral judgment development, can be better supported with the bi-factor model that was tested in the present study.
- Research Article
- 10.61838/kman.aftj.5.3.20
- Jan 1, 2024
- Applied Family Therapy Journal
Objective: Given the absence of a valid and reliable scale to measure this construct in Iran, this study aimed to translate and determine the validity and reliability of the Perceived Partner Responsiveness Scale (PPRS) among nurses. Methods: The research method was descriptive-survey in terms of data collection and correlational (exploratory and confirmatory factor analysis) in terms of data analysis. The statistical population of this study included all married female nurses in the city of Zanjan in 2020, selected using convenience sampling. Based on Kim's (2005) approach, the sample size reached 312, with 317 participants ultimately taking part in this study. The Personal Assessment of Intimacy in Relationships Scale (Schaefer & Olson, 1981), the Perceived Partner Responsiveness Scale (Reis et al., 2017), and the Revised Experiences in Close Relationships Questionnaire (Fraley et al., 2005) were used as research instruments. Data were analyzed using SPSS software (version 26), and the Lavaan and EGAnet packages in R software. Findings: The results indicated that the PPRS had desirable reliability among the studied samples. The ordinal theta for the total score of this scale in the present study was .978. However, due to the absence of changes in the total score reliability coefficient if any item was deleted, local dependence of items is possible. Exploratory network analysis outputs demonstrated a three-factor structure in this scale. Bootstrap analysis in both parametric and non-parametric states also supported this three-factor structure. Nevertheless, in confirmatory factor analysis, the bifactor and three-factor models had a better fit compared to other models. However, given the unsatisfactory omega of first-order factors and the factor loadings in the specific factors of the bifactor model, the need for a second-order or higher-order factor was confirmed. Since the average variance extracted (AVE) index in all models was above .50 (.767), the convergent validity of the scale was confirmed. Furthermore, since the square root of the AVE index (.875) was greater than the PPR correlations with other variables studied in the different models of the present research, it can be said that the discriminant validity at the construct level was established. Additionally, due to the significant correlation coefficients of the PPR construct with other studied variables, the convergent and divergent validity of the scale was also confirmed. Conclusion: The psychometric analyses of the present study showed that the PPRS is a completely reliable and valid scale for measuring nurses' perceptions of their partner's responsiveness. We recommend that future research examine the applicability of this scale in different communities and samples with varying cultural and demographic characteristics.