IRT Methods Research Articles

Women's empowerment is a process that includes increases in intrinsic agency (power within); instrumental agency (power to); and collective agency (power with). We used baseline data from two studies-Targeting and Realigning Agriculture for Improved Nutrition (TRAIN) in Bangladesh and Building Resilience in Burkina Faso (BRB)-to assess the measurement properties of survey questions operationalizing selected dimensions of intrinsic, instrumental, and collective agency in the project-level Women's Empowerment in Agricultural Index (pro-WEAI). We applied unidimensional item-response models to question (item) sets to assess their measurement properties, and when possible, their cross-context measurement equivalence-a requirement of measures designed for cross-group comparisons. For intrinsic agency in the right to bodily integrity, measured with five attitudinal questions about intimate partner violence (IPV) against women, model assumptions of unidimensionality and local independence were met. Four items showed good model fit and measurement equivalence across TRAIN and BRB. For item sets designed to capture autonomy in income, intrinsic agency in livelihoods activities, and instrumental agency in: livelihoods activities, the sale or use of outputs, the use of income, and borrowing from financial services, model assumptions were not met, model fit was poor, and items generally were weakly related to the latent (unobserved) agency construct. For intrinsic and instrumental agency in livelihoods activities and for instrumental agency in the sale or use of outputs and in the use of income, items sets had similar precision along the latent-agency continuum, suggesting that similar item sets could be dropped without a loss of precision. IRT models for collective agency were not estimable because of low reported presence and membership in community groups. This analysis demonstrates the use of IRT methods to assess the measurement properties of item sets in pro-WEAI, and empowerment scales generally. Findings suggest that a shorter version of pro-WEAI can be developed that will improve its measurement properties. We recommend revisions to the pro-WEAI questionnaire and call for new measures of women's collective agency.

Multi-item surveys are frequently used to study scores on latent factors, like human values, attitudes, and behavior. Such studies often include a comparison, between specific groups of individuals or residents of different countries, either at one or multiple points in time (i.e., a cross-sectional or a longitudinal comparison or both). If latent factor means are to be meaningfully compared, the measurement structures of the latent factor and their survey items should be stable, that is “invariant.” As proposed by Mellenbergh (1989), “measurement invariance” (MI) requires that the association between the items (or test scores) and the latent factors (or latent traits) of individuals should not depend on group membership or measurement occasion (i.e., time). In other words, if item scores are (approximately) multivariate normally distributed, conditional on the latent factor scores, the expected values, the covariances between items, and the unexplained variance unrelated to the latent factors should be equal across groups. Many studies examining MI of survey scales have shown that the MI assumption is very hard to meet. In particular, strict forms of MI rarely hold. With “strict” we refer to a situation in which measurement parameters are exactly the same across groups or measurement occasions, that is an enforcement of zero tolerance with respect to deviations between groups or measurement occasions. Often, researchers just ignore MI issues and compare latent factor means across groups or measurement occasions even though the psychometric basis for such a practice does not hold. However, when a strict form of MI is not established and one must conclude that respondents attach different meanings to survey items, this makes it impossible to make valid comparisons between latent factor means. As such, the potential bias caused by measurement non-invariance obstructs the comparison of latent factor means (if strict MI does not hold) or regression coefficients (if less strict forms of MI do not hold). Traditionally, MI is tested for in a multiple group confirmatory factor analysis (MGCFA) with groups defined by unordered categorical (i.e., nominal) between-subject variables. In MGCFA, MI is tested at each constraint of the latent factor model using a series of nested (latent) factor models. This traditional way of testing for MI originated with Joreskog (1971), who was the first scholar to thoroughly discuss the invariance of latent factor (or measurement) structures. Additionally, Sorbom (1974, 1978) pioneered the specification and estimation of latent factor means using a multi-group SEM approach in LISREL (Joreskog and Sorbom, 1996). Following these contributions the multi-group specification of latent factor structures has become widespread in all major SEM software programs (e.g., AMOS Arbuckle, 2006, EQS Bender and Wu, 1995, LAVAAN Rosseel, 2012, Mplus Muthen and Muthen, 2013, STATA STATA, 2015, and OpenMx Boker et al., 2011). Shortly thereafter, Byrne et al. (1989) introduced the distinction between full and partial MI. Although their introduction was of great value, the first formal treatment of different forms of MI and their consequences for the validity of multi-group/multi-time comparisons is attributable to Meredith (1993). So far, a tremendous amount of papers dealing with MI have been published. The literature on MI published in the 20th century is nicely summarized by Vandenberg and Lance (2000). Noteworthy is also the overview of applications in cross-cultural studies provided by Davidov et al. (2014), as well as a recent book by Millsap (2011) containing a general systematic treatment of the topic of MI. The traditional MGCFA approach to MI-testing is described by, for example, Byrne (2004), Chen et al. (2005), Gregorich (2006), van de Schoot et al. (2012), Vandenberg (2002) and Wicherts and Dolan (2010). Researchers entering the field of MI are recommended to first consult Meredith (1993) and Millsap (2011) before reading other valuable academic works. Recent developments in statistics have provided new analytical tools for assessing MI. The aim of this special issue is to provide a forum for a discussion of MI, covering some crucial “themes”: (1) ways to assess and deal with measurement non-invariance; (2) Bayesian and IRT methods employing the concept of approximate MI; and (3) new or adjusted approaches for testing MI to fit increasingly complex statistical models and specific characteristics of survey data.

IRT Methods Research Articles

Articles published on IRT Methods

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

O.4.1-3 Effects of novel isometric resistance training on resting and ambulatory blood pressure measures: a comparison between exercise modes

The Development and Validation of the Difficulties in Emotion Regulation Scale-8: Providing Respondents with a Uniform Context That Elicits Thinking About Situations Requiring Emotion Regulation

The effect of missing data and imputation on the detection of bias in cognitive testing using differential item functioning methods

Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods

Cross-validation of the Utility of Test of Memory Malingering (TOMM) Cut-offs in a Large Colombian Sample

A Latent Class IRT Approach to Defining and Measuring Language Proficiency

Demographic and health factors associated with pandemic anxiety in the context of COVID-19.

Особливості застосування математичних моделей тестів в умовах дистанційного контролю

Measurement properties of the project-level Women's Empowerment in Agriculture Index.

Academic English Proficiency Assessment Using a Computerized Adaptive Test

Semi-Quantitative Comparison of Infrared Thermography with Indocyanine Green Imaging in Porcine Intestinal Resection

열화상 측정법과 열류량 측정법을 이용한 기존 외벽체의 열관류율 및 민감도 분석

The Learning Behaviors Scale: National standardization in Trinidad and Tobago

Parent and teacher perspectives on psychological adjustment: A national measurement study in Trinidad and Tobago

Comparing apples with oranges? An approach to link TIMSS and the National Educational Panel Study in Germany via equipercentile and IRT methods

Editorial: Measurement Invariance.

Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

The Arm Function in Multiple Sclerosis Questionnaire (AMSQ): development and validation of a new tool using IRT methods

A brief Dutch language Impact Message Inventory–Circumplex (IMI-C Short) using non-parametric item response theory

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

IRT Methods Research Articles

Articles published on IRT Methods

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

O.4.1-3 Effects of novel isometric resistance training on resting and ambulatory blood pressure measures: a comparison between exercise modes

The Development and Validation of the Difficulties in Emotion Regulation Scale-8: Providing Respondents with a Uniform Context That Elicits Thinking About Situations Requiring Emotion Regulation

The effect of missing data and imputation on the detection of bias in cognitive testing using differential item functioning methods

Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods

Cross-validation of the Utility of Test of Memory Malingering (TOMM) Cut-offs in a Large Colombian Sample

A Latent Class IRT Approach to Defining and Measuring Language Proficiency

Demographic and health factors associated with pandemic anxiety in the context of COVID-19.

Особливості застосування математичних моделей тестів в умовах дистанційного контролю

Measurement properties of the project-level Women's Empowerment in Agriculture Index.

Academic English Proficiency Assessment Using a Computerized Adaptive Test

Semi-Quantitative Comparison of Infrared Thermography with Indocyanine Green Imaging in Porcine Intestinal Resection

열화상 측정법과 열류량 측정법을 이용한 기존 외벽체의 열관류율 및 민감도 분석

The Learning Behaviors Scale: National standardization in Trinidad and Tobago

Parent and teacher perspectives on psychological adjustment: A national measurement study in Trinidad and Tobago

Comparing apples with oranges? An approach to link TIMSS and the National Educational Panel Study in Germany via equipercentile and IRT methods

Editorial: Measurement Invariance.

Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

The Arm Function in Multiple Sclerosis Questionnaire (AMSQ): development and validation of a new tool using IRT methods

A brief Dutch language Impact Message Inventory–Circumplex (IMI-C Short) using non-parametric item response theory