Understanding the alternative Mantel-Haenszel statistic: Factors affecting its robustness to detect non-uniform DIF

Mohammad Mollazehi,Abdel-Salam G. Abdel-Salam

doi:10.1080/03610926.2024.2330668

Abstract

Test-item bias has become an increasingly challenging issue in statistics and education. A popular method, the Mantel-Haenszel test, is used for detecting non-uniform differential item functioning (DIF) but requires constructing several performance tiers to maintain robustness. The alternative Mantel-Haenszel (AMH) test was developed within the last three decades as a proxy procedure requiring only two scoring tiers. However, there needs to be more information on how essential factors like comparison group sizes and question discrimination affect its ability to detect bias. This study investigates how item difficulty and discrimination and the examined ratio between the focal and reference groups impact the likelihood of the AMH test detecting DIF. A comprehensive simulation study was conducted where the test scores were generated under three conditions. There are three commonly used difficulty levels (easy, medium, and hard), two discrimination levels (referred to as 'low’ and 'high’), and three group comparison ratios (1:1, 2:1, and 5:1). The simulation study showed the AMH test’s detection rates comparable to those of other standardized procedures like the Breslow-Day (BD) and even better. The current study aims to investigate and determine the factors that affect AMH test detection behavior. The study concludes with an application involving collegiate-level test data comparing students across genders and majors.

Full Text