When Large-Scale Assessments Meet Data Science: The Big-Fish-Little-Pond Effect in Fourth- and Eighth-Grade Mathematics Across Nations.

Ze Wang

doi:10.3389/fpsyg.2020.579545

Abstract

The programming language of R has useful data science tools that can automate analysis of large-scale educational assessment data such as those available from the United States Department of Education’s National Center for Education Statistics (NCES). This study used three R packages: EdSurvey, MplusAutomation, and tidyverse to examine the big-fish-little-pond effect (BFLPE) in 56 countries in fourth grade and 46 countries in eighth grade for the subject of mathematics with data from the Trends in International Mathematics and Science Study (TIMSS) 2015. The BFLPE refers to the phenomenon that students in higher-achieving contexts tend to have lower self-concept than similarly able students in lower-achieving contexts due to social comparison. In this study, it is used as a substantive theory to illustrate the implementation of data science tools to carry out large-scale cross-national analysis. For each country and grade, two statistical models were applied for cross-level measurement invariance testing, and for testing the BFLPE, respectively. The first model was a multilevel confirmatory factor analysis for the measurement of mathematics self-concept using three items. The second model was multilevel latent variable modeling that decomposed the effect of achievement on self-concept into between and within components; the difference between them was the contextual effect of the BFLPE. The BFLPE was found in 51 of the 56 countries in fourth grade and 44 of the 46 countries in eighth grade. The study provides syntax and discusses problems encountered while using the tools for modeling and processing of modeling results.

Highlights

Data science tools, those developed with the statistical language of R (R Core Team, 2020), have been increasingly used in educational and social sciences
For parameter estimation of multilevel confirmatory factor analysis (CFA) modeling with the maximum likelihood estimation with robust standard errors (MLR), by default, Mplus uses fixed starting values
These fixed starting values could lead to non-convergence of parameter estimation

Summary

Introduction

Those developed with the statistical language of R (R Core Team, 2020), have been increasingly used in educational and social sciences. R is the second most frequently used data science software following SPSS (Muenchen, n.d.). Given its integrated system of data wrangling, statistical modeling, visualization, and communication (Grolemund and Wickham, 2018), R is appealing to those conducting empirical analysis There are over 16,000 R packages available on the Comprehensive R Archive Network (CRAN) – R’s main repository of packages – and more packages in other repositories (such as GitHub). Packages are developed for various topics (for example, see “Task Views” at the CRAN). They, together with R’s core packages, provide tools for researchers to work with different aspects of using data. The sheer amount of R resources seems daunting to beginner users, let alone its sometimes unfamiliar or non-userfriendly ways of “doing” things

Objectives

Methods

Results

Conclusion