Abstract

The programming language of R has useful data science tools that can automate analysis of large-scale educational assessment data such as those available from the United States Department of Education’s National Center for Education Statistics (NCES). This study used three R packages: EdSurvey, MplusAutomation, and tidyverse to examine the big-fish-little-pond effect (BFLPE) in 56 countries in fourth grade and 46 countries in eighth grade for the subject of mathematics with data from the Trends in International Mathematics and Science Study (TIMSS) 2015. The BFLPE refers to the phenomenon that students in higher-achieving contexts tend to have lower self-concept than similarly able students in lower-achieving contexts due to social comparison. In this study, it is used as a substantive theory to illustrate the implementation of data science tools to carry out large-scale cross-national analysis. For each country and grade, two statistical models were applied for cross-level measurement invariance testing, and for testing the BFLPE, respectively. The first model was a multilevel confirmatory factor analysis for the measurement of mathematics self-concept using three items. The second model was multilevel latent variable modeling that decomposed the effect of achievement on self-concept into between and within components; the difference between them was the contextual effect of the BFLPE. The BFLPE was found in 51 of the 56 countries in fourth grade and 44 of the 46 countries in eighth grade. The study provides syntax and discusses problems encountered while using the tools for modeling and processing of modeling results.

Highlights

  • Data science tools, those developed with the statistical language of R (R Core Team, 2020), have been increasingly used in educational and social sciences

  • For parameter estimation of multilevel confirmatory factor analysis (CFA) modeling with the maximum likelihood estimation with robust standard errors (MLR), by default, Mplus uses fixed starting values

  • These fixed starting values could lead to non-convergence of parameter estimation

Read more

Summary

Introduction

Those developed with the statistical language of R (R Core Team, 2020), have been increasingly used in educational and social sciences. R is the second most frequently used data science software following SPSS (Muenchen, n.d.). Given its integrated system of data wrangling, statistical modeling, visualization, and communication (Grolemund and Wickham, 2018), R is appealing to those conducting empirical analysis There are over 16,000 R packages available on the Comprehensive R Archive Network (CRAN) – R’s main repository of packages – and more packages in other repositories (such as GitHub). Packages are developed for various topics (for example, see “Task Views” at the CRAN). They, together with R’s core packages, provide tools for researchers to work with different aspects of using data. The sheer amount of R resources seems daunting to beginner users, let alone its sometimes unfamiliar or non-userfriendly ways of “doing” things

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call