Abstract

The purpose of this study is to evaluate two methodological perspectives of test fairness using a national Secondary School Certificate (SSC) examinations. SSC is a suit of multi-subject national qualification tests at Grade 10 level in South Asian countries, such as Bangladesh, India, and Pakistan. Because it is a high-stakes test, the fairness of SSC tests is a major concern among public and educational policy planners. This study is a first attempt to investigate test fairness of the national SSC examination of Pakistan using two independent differential item functioning (DIF) and differential bundle functioning (DBF) procedures. The SSC was evaluated for possible gender bias using multiple-choice tests in three core subjects, namely, English, Mathematics, and Physics. The study was conducted in two phases using explanatory item response model (EIRM) and Simultaneous Item Bias Test (SIBTEST). In Phase 1, test items were studied for DIF, and items with severe DIF were flagged in each subject. In Phase 2, the item bundles were analyzed for DBF. Three items were detected with large DIF, one for each subject, and one item bundle was detected with a negligible DBF. Taken together, the results demonstrate that there is no major threat to the validity of the interpretation of examinees’ test scores on the SSC examination. The outcome from this study provided evidence for test fairness, which will enhance test development practices at the national examination authorities.

Highlights

  • Differential item functioning (DIF) occurs when examinees who have the same ability but belong to different groups have a different probability of answering a test item correctly, after being controlled for overall ability on the construct measured by the test

  • Secondary School Certificate (SSC) is a suit of multi-subject national qualification tests at Grade 10 level in South Asian countries, such as Bangladesh, India, and Pakistan, which is equivalent to General Certificate of Secondary Education (GCSE) examination in England

  • The DIF/Differential bundle functioning (DBF) methods could be classified based on the procedure they used for matching the groups and on the assumptions they made for the item response function (IRF)

Read more

Summary

Introduction

Differential item functioning (DIF) occurs when examinees who have the same ability but belong to different groups have a different probability of answering a test item correctly, after being controlled for overall ability on the construct measured by the test. Differential bundle functioning (DBF) is a concept built upon DIF, in which a subset of items or testlets within a test are organized to form a group of two or more items. Different measurement methods could be used for studying DIF and DBF These methods could use item response theory (IRT) or classical test theory (CTT). O. Muthén, Kao, & Burstein, 1991), and explanatory item response modeling (EIRM; De Boeck & Wilson, 2004; Wilson, De Boeck, & Carstensen, 2008) These methods evaluate the invariance in item and person parameters as well as the interactions between item and person parameters that form the basis for DIF and DBF within the IRT framework. The DIF/DBF methods could be classified based on the procedure they used for matching the groups and on the assumptions they made for the item response function (IRF). Many DIF detection procedures are available, a relatively small number of these methods are preferred based on their practicality as well as on their theoretical and empirical strengths (Gierl, Gotzmann, & Boughton, 2004; Shepard, Camilli, & Williams, 1985)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call