Abstract

The purpose of this study is to examine the distractors of items that exhibit differential item functioning (DIF) across gender to explain the possible sources of DIF in the context of large-scale tests. To this end, two non-linear logistic regression (NLR) models-based DIF methods (three parameters, 3PL-NLR and four-parameter, 4PL-NLR) were first used to detect DIF items, and the Mantel-Haenszel Delta (MH-Delta) DIF method was used to calculate the DIF effect size for each DIF item. Then, the multinomial log-linear regression (MLR) model and 2-PL nested logit model (2PL-NLM) were applied to items exhibiting DIF with moderate and large DIF effect sizes. The ultimate goals are (a) to examine behaviors of distractors across gender and (b) to investigate if distractors have any impact on DIF effects. DIF results of the Art Section of the General Aptitude Test (GAT-ART) based on both 3PL-NLR and 4PL-NLR methods indicate that only 10 DIF items had moderate to large DIF effects sizes. According to MLR differential distractor functioning (DDF) results, all items exhibited DDF across gender except for one item. An interesting finding of this study is that DIF items related to the verbal analogy and context analysis were in favor of female students, while all DIF items related to the reading comprehension subdomain were in favor of male students, which may signal the existence of content specific DIF or true ability difference across gender. DDF results show that distractors have a significant effect on DIF results. Therefore, DDF analysis is suggested along with DIF analysis since it signals the possible causes of DIF.

Highlights

  • Many types of research have been carried out to determine the validity and reliability of large-scale assessments because the performance of examinees on these tests has a critical impact on their educational admissions and future careers

  • First, Distractor Functioning functioning (DIF) analyses were conducted with two non-linear logistic regression-based DIF methods (3PL-NLR and 4PL-NLR) to detect the items that have significant DIF effects

  • The multinomial log-linear regression (MLR) method is classified as a divide-by-total method, which evaluates both DIF and Differential Distractor Functioning (DDF) effects simultaneously, while the nested logit model (NLM) is classified as a divide-by-distractor method, which evaluates the DDF effect independent of DIF and, determines whether item distractors contributed or caused DIF

Read more

Summary

Introduction

Many types of research have been carried out to determine the validity and reliability of large-scale assessments because the performance of examinees on these tests has a critical impact on their educational admissions and future careers. Validity is a core feature of any kind of assessment assumed to be accurate and fair (Bond et al, 2003; Jamalzadeh et al, 2021). The goal of test developers and testing companies is to increase the validity and reliability of tests by decreasing any types of confounding factors and errors to ensure fairness across different subgroups. Examining the factorial structure of tests, investigating the differential item. Differential Item and Distractor Functioning functioning (DIF) across subgroups, investigating the behavior of distractors, and determining what causes these confounding factors serve the purpose of increasing the validity of and fairness of score inferences. The comparison among subgroups, such as gender or nationality groups, on the underlying construct is necessary for fairness purposes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call