Measures of long-term trends in mathematics: linking large-scale assessments over 50\xa0years

Erika Majoros,Jan-Eric Gustafsson,Stefan Johansson,Monica Rosén

doi:10.1007/s11092-021-09353-z

Erika Majoros, Jan-Eric Gustafsson + Show 2 more

Open Access

https://doi.org/10.1007/s11092-021-09353-z

Copy DOI

Abstract

International comparative assessments of student achievement are constructed to assess country-level differences and change over time. A coherent understanding of the international trends in educational outcomes is strongly needed as suggested by numerous previous studies. Investigating these trends requires long-term analysis, as substantial changes on the system level are rarely observed regarding student outcomes in short periods (i.e., between adjacent international assessment cycles). The present study aims to link recent and older studies conducted by the International Association for the Evaluation of Educational Achievement (IEA) onto a common scale to study long-term trends within and across countries. It explores the comparability of the achievement tests of the Trends in International Mathematics and Science Study and previous IEA studies on mathematics in grade eight. Employing item response theory, we perform a concurrent calibration of item parameters to link the eight studies onto a common scale spanning the period from 1964 to 2015 using data from England, Israel, Japan, and the USA.

Highlights

For more than half a century, international large-scale assessments (ILSAs) have provided a large body of data on student achievement from a vast number of educational systems all over the world
There were 37 items out of 70 in the First International Mathematics Study (FIMS) that were repeated in the Second International Mathematics Study (SIMS), nine of which were repeated in Trends in International Mathematics and Science Study (TIMSS) 1995, and 18 items in SIMS out of a pool of 199 items that were repeated in TIMSS 1995
We investigate the degrees of similarity of ILSAs in mathematics from FIMS administered in 1964 to the most recent cycle of TIMSS in 2015

Summary

Introduction

For more than half a century, international large-scale assessments (ILSAs) have provided a large body of data on student achievement from a vast number of educational systems all over the world. Strietholt and Rosén (2016) demonstrated how to link the achievement tests from recent and older IEA studies of reading literacy onto the same measurement scale with item response theory (IRT) modeling. Johansson and Strietholt (2019) used overlaps in the assessment material to equate five cycles of the Trends in International Mathematics and Science Study (TIMSS) They applied a common-item nonequivalent group design and IRT modeling. There are several attempts to link test scores from different regional, national, or international assessments over a long period that rely on IRT within the studies and classical test theory across them because of the limited amount of overlapping items (e.g., Altinok et al 2018; Chmielewski 2019; Hanushek and Wößmann 2012)

Objectives

Methods

Results

Conclusion