Abstract

Differential item functioning (DIF) is a technique used to examine whether items function differently across different groups. The DIF analysis helps detect bias in an assessment to ensure the fairness of the assessment. However, most of the previous research has focused on high-stakes assessments. There is a dearth in research that laying emphasis on low-stakes assessments, which is also significant for the test development and validation process. Additionally, gender difference in test performance is always a particular concern for researchers to evaluate whether a test is fair or not. This present study investigated whether test items of the General English Proficiency Test for Kids (GEPT-Kids) are free of bias in terms of gender differences. A mixed-method sequential explanatory research design was adopted with two phases. In phase I, test performance data of 492 participants from five Chinese speaking cities were analyzed by the Mantel-Haenszel (MH) method to detect gender DIF. In phase II, items that manifested DIF were subject to content analysis through three experienced reviewers to identify possible sources of DIF. The results showed that three items were detected with moderate gender DIF through statistical methods and three items were identified as possible biased items by expert judgment. The results provide preliminary contributions to DIF analysis for low-stakes assessment in the field of language assessment. Besides, young language learners, especially in the Chinese context, have been drawn renewed attention. Thus, the results may also add to the body of literature that can shed some light on the test development for young language learners.

Highlights

  • Test fairness is always closely related to test validity and validation in the field of language assessment (Kunnan, 2010)

  • This present study investigated whether test items of the General English Proficiency Test for Kids (GEPT-Kids) are free of bias in terms of gender differences

  • To bridge the gap and expand the body of research on gender-related Differential item functioning (DIF), the present study investigated whether the test items of the General English Proficiency Test for Kids (GEPT-Kids) are free of bias in terms of gender differences

Read more

Summary

Introduction

Test fairness is always closely related to test validity and validation in the field of language assessment (Kunnan, 2010). In accordance with the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), test fairness refers to “examinees of equal standing with respect to the construct of the test is intended to measure should on average earn the same test score, irrespective of group membership” In the process of test validation, one way to determine the existence of bias in a test is to examine the group differences on test performance. Differential Item Functioning (DIF) is a technique that identifies these items that function differently favoring a subgroup of test takers. DIF studies help detect bias in a test and ensure a test is fair to test takers. As Walker (2011) suggested, DIF analysis is a significant part of test development and validation. A large number of items exhibiting DIF would threaten the construct validity and fairness of a test

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.