Improving the fairness of ECL listening tests by detecting gender-biased items

Hrisztalina Hrisztova-Gotthardt,Réka Werner

doi:10.22210/strjez/50-2/3

Abstract

Improving the fairness of ECL listening tests by detecting gender-biased items

Highlights

Reliability and validity have been for decades the two main test characteristics language test developers pay the most attention to while designing their tests
In order to prove this assumption, the present study aims to examine to what extent ECL listening test items at Common European Framework of Reference for Languages (CEFR) level B2 administrated between February 2018 and December 2019 exhibit differential item functioning towards test-taker groups in terms of
The results of the statistical analysis, performed with the MFRM-based software Facets, showed differential item functioning for 13 items, which corresponds to 6.5 percent of the total number of items

Summary

Introduction

Reliability and validity have been for decades the two main test characteristics language test developers pay the most attention to while designing their tests (cf. Bachman, 1990: 24). The term test fairness has been appearing with increasing frequency in papers, studies, and presentations on the topic of language assessment (e.g., Kane, 2010; Kremmel, 2019; Kunnan, 2000; 2004; 2007; 2014; Stoynoff, 2012). Professional guidelines such as the Code for Fair Testing Practices in Education (Joint Committee on Testing Practices, 2005: 23), the ETS International Principles for the Fairness in Assessments (ETS, 2016: 3–4) and the ALTE Principles of Good Practice (ALTE, 2020: 13) emphasize that test developers should strive to make their tests as fair as possible for candidates of different gender, age, ethnic origin, cultural and language background, and special handicapping conditions and needs. One of the most effective strategies for achieving this goal is to construct bias-free tests

Methods

Results

Conclusion