Anchor Detection Strategy in Moderated Non-Linear Factor Analysis for Differential Item Functioning (DIF).

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Ensuring measurement invariance is crucial for fair psychological and educational assessments, particularly in detecting Differential Item Functioning (DIF). Moderated Non-linear Factor Analysis (MNLFA) provides a flexible framework for detecting DIF by modeling item parameters as functions of observed covariates. However, a significant challenge in MNLFA-based DIF detection is anchor item selection, as improperly chosen anchors can bias results. This study proposes a refined constrained-baseline anchor detection approach utilizing information criteria (IC) for model selection. The proposed three-step procedure sequentially identifies potential DIF items through the Bayesian Information Criterion (BIC) and Weighted Information Criterion (WIC), followed by DIF-free anchor items using the Akaike Information Criterion (AIC). The method's effectiveness in controlling Type I error rates while maintaining statistical power is evaluated through simulation studies and empirical data analysis. Comparisons with regularization approaches demonstrate the proposed method's accuracy and computational efficiency.

Similar Papers
  • Research Article
  • 10.1111/emip.12669
Digital Module 38: Differential Item Functioning by Multiple Variables Using Moderated Nonlinear Factor Analysis
  • May 20, 2025
  • Educational Measurement: Issues and Practice
  • Sanford R Student + 1 more

Module AbstractWhen investigating potential bias in educational test items via differential item functioning (DIF) analysis, researchers have historically been limited to comparing two groups of students at a time. The recent introduction of Moderated Nonlinear Factor Analysis (MNLFA) generalizes Item Response Theory models to extend the assessment of DIF to an arbitrary number of background variables. This facilitates more complex analyses such as DIF across more than two groups (e.g. low/middle/high socioeconomic status), across more than one background variable (e.g. DIF by race/ethnicity and gender), across non‐categorical background variables (e.g. DIF by parental income), and more. Framing MNLFA as a generalization of the two‐parameter logistic IRT model, we introduce the model with an emphasis on the parameters representing DIF versus impact; describe the current state of the art for estimating MNLFA models; and illustrate the application of MNLFA in a scenario where one wants to test for DIF across two background variables at once.

  • Research Article
  • Cite Count Icon 9
  • 10.1177/0013164414526881
Evaluation of Two Types of Differential Item Functioning in Factor Mixture Models With Binary Outcomes
  • Mar 20, 2014
  • Educational and Psychological Measurement
  • Hwayoung Lee + 1 more

Conventional differential item functioning (DIF) detection methods (e.g., the Mantel–Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as personality or response patterns. The factor mixture model (FMM) is designed to detect unobserved sources of heterogeneity in factor models. The current study investigated use of the FMM for detecting between-class latent DIF and class-specific observed DIF. Factors that were manipulated included the DIF effect size and the latent class probabilities. The performance of model fit indices (Akaike information criterion [AIC], Bayesian information criterion [BIC], sample size–adjusted BIC, and consistent AIC) were assessed for their detection of the correct DIF model. The recovery of DIF parameters was also assessed. Results indicated that use of FMMs with binary outcomes performed well in terms of the DIF detection and for recovery of large DIF effects. When class probabilities were unequal with small DIF effects, performance decreased for fit indices, power, and the recovery of DIF effects compared with equal class probability conditions. Inflated Type I errors were found for non-DIF items across simulation conditions. Results and future research directions for applied and methodological are discussed.

  • Research Article
  • Cite Count Icon 16
  • 10.1080/15305058.2012.692415
Assessing the Item Response Theory With Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning
  • Jul 1, 2013
  • International Journal of Testing
  • Louis Tay + 2 more

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic. Candidate items are selected in turn based on high unconditional bivariate residual (UBVR) values. This iterative process continues until no further DIF is detected or the Bayes information criterion (BIC) increases. We expanded on the procedure and examined the use of conditional bivariate residuals (CBVR) to flag for DIF; aside from the BIC, alternative stopping criteria were also considered. Simulation results showed that the IRT-C approach for assessing DIF performed well, with the use of CBVR yielding slightly better power and Type I error rates than UBVR. Additionally, using no information criterion yielded higher power than using the BIC, although Type I error rates were generally well controlled in both cases. Across the simulation conditions, the IRT-C procedure produced results similar to the Mantel-Haenszel and MIMIC procedures.

  • Research Article
  • Cite Count Icon 3
  • 10.1177/01466216211066606
Bayesian Approaches for Detecting Differential Item Functioning Using the Generalized Graded Unfolding Model.
  • Feb 10, 2022
  • Applied Psychological Measurement
  • Seang-Hwane Joo + 2 more

Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.

  • Research Article
  • 10.1093/sleep/zsaf090.0507
0507 Evaluating Differential Item Functioning of the Insomnia Severity Index Using Moderated Nonlinear Factor Analysis
  • May 19, 2025
  • SLEEP
  • Yumei Chen + 2 more

Introduction Insomnia is common among veterans, particularly those with mental health conditions like depression and anxiety and can lead to significant health complications. Routine screening in healthcare settings is crucial to prevent chronic insomnia. The Insomnia Severity Index (ISI), a widely used and validated tool, has been adapted for diverse populations, but its differential item functioning (DIF) remains underexplored. This study uses moderated nonlinear factor analysis (MNLFA) to address this gap. This flexible approach allows for simultaneous modeling of multiple sources of bias based on individual characteristics, which can improve accuracy of insomnia severity ratings. Methods Veterans (N = 620) from the Miami VA sleep center completed a baseline psychosocial assessment, HSAT (mean AHI=18), and medical/psychiatric diagnoses were extracted from medical records. MNLFA was used to model nighttime (ISI items1a,b,c) and daytime symptoms (items 2–5) separately, examining the effects of age, gender, race/ethnicity, depression, anxiety, PTSD, and chronic pain on DIF. DIF-adjusted factor scores, confirmatory factor analysis (CFA) factor scores, and sum scores were compared. Results The veteran sample (N=620) was middle-aged (M=52, SD=14.5), predominantly male (83.5%), and White (57.3%), with 50% diagnosed with chronic pain and 51% with clinical depression. DIF analysis showed ISI1 had intercept bias for age, Hispanic/White identity, chronic pain, and depression, as well as factor loading bias for age. ISI3 had intercept bias for depression. ISI4 exhibited intercept and factor loading bias for male gender. ISI5 and ISI6 showed intercept bias for age, and ISI7 showed both intercept and factor-loading bias for PTSD. No DIF was found for AHI. Factor scores derived from MNLFA, CFA, and sum scores were highly correlated across both factors. Conclusion This study examined the DIF of ISI by investigating how an array of psychosocial factors influences insomnia severity ratings. Six of the seven ISI items demonstrated bias based on age, gender, race, depression, PTSD, and chronic pain. Differences observed between groups with these characteristics may be influenced. MNLFA demonstrated methodological advantages by allowing simultaneous modeling of DIF testing. Although difficult to implement in primary care, MNLFA-based factor scores hold promise for secondary predictive models. Support (if any)

  • Supplementary Content
  • Cite Count Icon 83
  • 10.3200/jexe.72.3.221-261
Effects of Anchor Item Methods on the Detection of Differential Item Functioning Within the Family of Rasch Models
  • Apr 1, 2004
  • The Journal of Experimental Education
  • Wen-Chung Wang

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are discussed and their performances of DIF detection are compared using Monte Carlo simulations within the family of Rasch models (Rasch, 1960). The results show that when the test contained multiple DIF items, only when the difference in the mean item difficulties between the reference and focal groups approached zero did the equal-mean-difficulty method and the all-other method function appropriately. In contrast, the constant method yielded unbiased parameter estimates, well-controlled Type I error, and high power of DIF detection, regardless of large differences in the mean item difficulties between groups and high percentages of DIF items in the tests. In addition, the more anchor items in the constant method, the higher the power of detecting DIF. Therefore, the constant anchor item method is recommended when conducting DIF analysis. Methods of locating anchor items for implementing the constant method are also discussed.

  • Research Article
  • Cite Count Icon 9
  • 10.1177/0013164413506222
The Effect of Differential Item Functioning in Anchor Items on Population Invariance of Equating
  • Oct 16, 2013
  • Educational and Psychological Measurement
  • Anne Corinne Huggins

Invariant relationships in the internal mechanisms of estimating achievement scores on educational tests serve as the basis for concluding that a particular test is fair with respect to statistical bias concerns. Equating invariance and differential item functioning are both concerned with invariant relationships yet are treated separately in the psychometric literature. Connecting these two facets of statistical invariance is critical for developing a holistic definition of fairness in educational measurement, for fostering a deeper understanding of the nature and causes of equating invariance and a lack thereof, and for providing practitioners with guidelines for addressing reported score-level equity concerns. This study hypothesizes that differential item functioning manifested in anchor items of an assessment will have an effect on equating dependence. Findings show that when anchor item differential item functioning varies across forms in a differential manner across subpopulations, population invariance of equating can be compromised.

  • Research Article
  • Cite Count Icon 238
  • 10.1037/met0000077
A more general model for testing measurement invariance and differential item functioning.
  • Sep 1, 2017
  • Psychological methods
  • Daniel J Bauer

The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.addbeh.2021.107088
Comprehensive measurement invariance of alcohol outcome expectancies among adolescents using regularized moderated nonlinear factor analysis
  • Aug 17, 2021
  • Addictive Behaviors
  • Angela K Stevens + 4 more

Comprehensive measurement invariance of alcohol outcome expectancies among adolescents using regularized moderated nonlinear factor analysis

  • Research Article
  • 10.1016/j.drugalcdep.2021.109068
An application of moderated nonlinear factor analysis to develop a commensurate measure of alcohol problems across four alcohol treatment studies
  • Sep 24, 2021
  • Drug and Alcohol Dependence
  • Dylan K Richards + 4 more

An application of moderated nonlinear factor analysis to develop a commensurate measure of alcohol problems across four alcohol treatment studies

  • Research Article
  • Cite Count Icon 18
  • 10.1097/mlr.0b013e318207edb5
Differential Item Functioning by Survey Language Among Older Hispanics Enrolled in Medicare Managed Care
  • May 1, 2011
  • Medical Care
  • Claude Messan Setodji + 4 more

To propose a permutation-based approach of anchor item detection and evaluate differential item functioning (DIF) related to language of administration (English vs. Spanish) for 9 questions assessing patients' perceptions of their providers from the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Medicare 2.0 survey. METHOD AND STUDY DESIGN: CAHPS 2.0 health plan survey data collected from 703 Hispanics who completed the survey in Spanish were matched on personal characteristics to 703 Hispanics that completed the survey in English. Steps to be followed for the detection of anchor items using the permutation tests are proposed and these tests in conjunction with item response theory were used for the identification of anchor items and DIF detection. Of the questions studied, 4 were selected as anchor items and 3 of the remaining questions were found to have DIF (P < 0.05). The 3 questions with DIF asked about seeing the doctor within 15 minutes of the appointment time, respect for what patients had to say, and provider spending enough time with patients. Failure to account for language differences in CAHPS survey items may result in misleading conclusions about disparities in health care experiences between Spanish and English speakers. Statistical adjustments are needed when using the items with DIF.

  • Research Article
  • Cite Count Icon 113
  • 10.1111/j.1745-3984.2001.tb01121.x
Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A Confirmatory Analysis
  • Jun 1, 2001
  • Journal of Educational Measurement
  • Mark J Gierl + 1 more

Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non‐equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11‐member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.

  • Research Article
  • Cite Count Icon 15
  • 10.3102/10769986221109208
Testing Differential Item Functioning Without Predefined Anchor Items Using Robust Regression
  • Jul 18, 2022
  • Journal of Educational and Behavioral Statistics
  • Weimeng Wang + 2 more

Differential item functioning (DIF) occurs when the probability of endorsing an item differs across groups for individuals with the same latent trait level. The presence of DIF items may jeopardize the validity of an instrument; therefore, it is crucial to identify DIF items in routine operations of educational assessment. While DIF detection procedures based on item response theory (IRT) have been widely used, a majority of IRT-based DIF tests assume predefined anchor (i.e., DIF-free) items. Not only is this assumption strong, but violations to it may also lead to erroneous inferences, for example, an inflated Type I error rate. We propose a general framework to define the effect sizes of DIF without a priori knowledge of anchor items. In particular, we quantify DIF by item-specific residuals from a regression model fitted to the true item parameters in respective groups. Moreover, the null distribution of the proposed test statistic using robust estimator can be derived analytically or approximated numerically even when there is a mix of DIF and non-DIF items, which yields asymptotically justified statistical inference. The Type I error rate and the power performance of the proposed procedure are evaluated and compared with the conventional likelihood-ratio DIF tests in a Monte Carlo experiment. Our simulation study has shown promising results in controlling Type I error rate and power of detecting DIF items. Even when there is a mix of DIF and non-DIF items, the true and false alarm rate can be well controlled when a robust regression estimator is used.

  • Research Article
  • Cite Count Icon 4
  • 10.3102/10769986231226439
Using Regularization to Identify Measurement Bias Across Multiple Background Characteristics: A Penalized Expectation–Maximization Algorithm
  • Feb 5, 2024
  • Journal of Educational and Behavioral Statistics
  • William C M Belzak + 1 more

Testing for differential item functioning (DIF) has undergone rapid statistical developments recently. Moderated nonlinear factor analysis (MNLFA) allows for simultaneous testing of DIF among multiple categorical and continuous covariates (e.g., sex, age, ethnicity, etc.), and regularization has shown promising results for identifying DIF among many covariates. However, computationally inefficient estimation methods have hampered practical use of the regularized MNFLA method. We develop a penalized expectation–maximization (EM) algorithm with soft- and firm-thresholding to more efficiently estimate regularized MNLFA parameters. Simulation and empirical results show that, compared to previous implementations of regularized MNFLA, the penalized EM algorithm is faster, more flexible, and more statistically principled. This method also yields similar recovery of DIF relative to previous implementations, suggesting that regularized DIF detection remains a preferred approach over traditional methods of identifying DIF.

  • Research Article
  • Cite Count Icon 31
  • 10.5897/jdae.9000032
Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of an asymmetric price relationship
  • Jan 31, 2010
  • Journal of Development and Agricultural Economics
  • Henry De-Graft Acquah

Information criteria provide an attractive basis for model selection. However, little is understood about their relative performance in asymmetric price transmission modelling framework. To explore this issue, this research evaluated the performance of the two commonly used model selection criteria, Akaike information criteria (AIC) and Bayesian information criteria (BIC) in discriminating between asymmetric price transmission models under various conditions. Monte Carlo experimentation indicated that the performance of the different model selection criteria are affected by the size of the data, the level of asymmetry and the amount of noise in the model used in the application. The Bayesian information criterion is consistent and outperforms AIC in selecting the suitable asymmetric price relationship in large samples. Key words: Model selection, Akaike’s information criteria (AIC), Bayesian information criteria (BIC), asymmetry, Monte Carlo.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.