When construct-irrelevant sources affect item difficulty, validity of the assessment is compromised. Using responses of 260000 students from 71 countries to the Programme for International Student Assessment (PISA) 2018 mathematics assessment and cross-classified mixed effects models, we examined three validity concerns associated with the construct-irrelevant factor, item format: whether the format influenced item difficulty, whether item format’s impact on difficulty varied across countries, undermining PISA’s foundational goal of meaningful country comparisons, and whether item format effects differed between genders, affecting assessment fairness. Item format contributed to a substantial average of 12 % of variance in item difficulties. The effect of item format was non-uniform across countries, with 30 % of the variance in item difficulties being due to format in lower-performing countries, and 10 % in higher-performing countries, challenging the comparability of educational outcomes. The impact of gender on item format differences was minor. Implications for secondary research and assessment design are discussed.
Read full abstract