IRT scoring procedures for TIMSS data

Gregory Camilli,John A Dossey

doi:10.1016/j.mex.2019.06.015

Abstract

In large-scale international assessment programs, results for mathematics proficiency are typically reported for jurisdictions such as provinces or countries. An overall score is provided along with subscores based on content subdomains defined in the test specifications. In this paper, an alternative method for obtaining empirical subscores is described, where the empirical subscores are based on an exploratory item response theory (IRT) factor solution. This alternative scoring is intended to augment rather than to replace traditional scoring procedures. The IRT scoring method is applied to the mathematics achievement data from the Trends in International Mathematics and Science Study (TIMSS). A brief overview of the method is given, and additional material is given for validation of the empirical subscores. The ultimate goal of scoring is to provide diagnostic feedback in the form of naturally occurring item clustering. This provides useful information in addition to traditional subscores based on test specifications. As shown by Camilli and Dossey (2019), the achievement ranks of countries may change depending on which empirical subscore of mathematics is considered. Traditional subscores are highly correlated and tend to provide similar rank orders.•The methods takes advantage of the TIMSS sampling design, specifically pairs of jackknife zones, to aggregate categorical to higher-order sampling units for IRT factor analysis.•Once factor scores are estimated for sampling units and interpreted, they are aggregated to the jurisdiction level (countries, states, provinces) using sampling weights. The procedure for obtaining standard errors of jurisdictional level scores combines cross-sampling-unit variance and Monte Carlo sampling variation.•Full technical details of the IRT factoring procedures are given in Camilli and Fox (2015). Fox (2010) provides additional background for Bayesian item response modeling techniques. The estimation algorithm is based on stochastic approximation expectation-maximization (SAEM).

Highlights

Once factor scores are estimated for sampling units and interpreted, they are aggregated to the jurisdiction level using sampling weights
In large-scale international assessments, results for mathematics achievement are typically reported for jurisdictions such as provinces or countries
A brief overview of the method is given below, and additional details for subscore estimation are given in Camilli and Fox [1]

Summary

Introduction

The methods takes advantage of the TIMSS sampling design, pairs of jackknife zones, to aggregate categorical to higher-order sampling units for IRT factor analysis. Once factor scores are estimated for sampling units and interpreted, they are aggregated to the jurisdiction level (countries, states, provinces) using sampling weights. A method for interpreting factor structures and deriving associated empirical factor scores is described relative to data from the Trends in International Mathematics and Science Study

Results

Conclusion