- New
- Research Article
- 10.1186/s40536-026-00295-w
- Apr 12, 2026
- Large-scale Assessments in Education
- Yuanyi Zhu + 1 more
Abstract Mathematical proficiency in adolescence is crucial for both individual success and national economic development; yet few studies have examined cross-national differences in how achievement-related beliefs, motivations, and institutional factors affect mathematical outcomes. Specifically, this study filled the gap by investigating the impact of growth mindset, intrinsic motivation, and school autonomy on students’ mathematics performance across five top-performing Asian (Singapore, Macao, Hong Kong, Taipei, and Korea) and five top-performing Western (Switzerland, Ireland, Denmark, the United Kingdom, and Poland) education systems. We used the Programme for International Student Assessment 2022 dataset (N = 66,789) and multilevel mediation analyses to reveal that (a) growth mindset was positively associated with mathematics performance in all five Western economies and two Asian economies (Singapore and Taipei). (b) Intrinsic motivation mediated the pathway from growth mindset to mathematics performance in four Western economies (Ireland, Denmark, the United Kingdom, and Poland) and all five Asian economies. (c) School autonomy exhibited context-dependent moderating effects, strengthening the influence of growth mindset in Korea while amplifying the association between intrinsic motivation and mathematics performance in Singapore. This study highlights the importance of aligning educational interventions that target students’ motivational beliefs with the cultural and institutional contexts in which they are implemented.
- New
- Research Article
- 10.1186/s40536-026-00294-x
- Apr 12, 2026
- Large-scale Assessments in Education
- John Alexander Calderón + 2 more
- Research Article
- 10.1186/s40536-026-00290-1
- Feb 26, 2026
- Large-scale Assessments in Education
- Zhicheng Liu + 3 more
- Research Article
- 10.1186/s40536-026-00285-y
- Feb 26, 2026
- Large-scale Assessments in Education
- Xiaying Zheng + 2 more
Abstract Large-scale surveys routinely rely on complex sample designs, necessitating special consideration of sampling variance estimation in multilevel models (MLM). While the sandwich estimator is widely used for this purpose, its implementation, particularly regarding stratification and weighting, remains challenging. Alternatively, the lesser-known replication methods provide a valid alternative; but they are often misunderstood as being only suitable for single-level models and are not widely supported by software packages. This paper clarifies key aspects of implementing both methods under two-level MLM common in large-scale surveys. We provide practical guidance on incorporating sample weights, correctly identifying variance strata for sandwich estimation, and applying replication-based variance estimation in MLM. Two simulation studies evaluate the performance of each method under correct and incorrect specifications, including omission of informative level-1 weights. Results demonstrate that the sandwich estimator and replication methods yield comparable variance estimates when implemented correctly and highlight the consequences of common misapplications. An empirical example using TIMSS 2015 Australia data is used to illustrate these issues in practice. This work contributes to improved methodological soundness in multilevel modeling and calls for expanded software support for replication methods in MLM.
- Research Article
- 10.1186/s40536-026-00284-z
- Feb 17, 2026
- Large-scale Assessments in Education
- Richard Nennstiel + 1 more
Abstract Purpose The escalating prevalence of social media usage (SMU) among adolescents coincides with a concerning decline in their life satisfaction globally. While public and scientific discourse vigorously debates the impact of SMU on well-being, existing research offers mixed evidence, often lacking in gender-specific analyses, control for key confounders, and clarity on the functional form of this relationship. This study addresses these gaps by examining the association between SMU time and adolescent life satisfaction in a large international context. Methods Utilizing data from the 2022 Programme for International Student Assessment (PISA), our analytical sample comprised 159,185 15-year-old students across 24 OECD countries. Life satisfaction was measured on a 0–10 scale, and daily SMU was categorized into five groups (no usage, less than 1 h, 1–3 h, 3–5 h, over 5 h). We employed ordinary least squares (OLS) regression models, controlling for socioeconomic status (SES) and bullying experiences as confounders. Models were estimated separately for boys and girls, and a combined model included gender interaction effects to test for gender differences. We conducted both pooled analyses with country-fixed effects and separate country-specific analyses; all models were weighted to account for complex sampling Results Our pooled analyses reveal an inverted J-shaped relationship between SMU time and life satisfaction. Consistent with the digital Goldilocks hypothesis, adolescents with moderate SMU (1–3 h daily) reported the highest life satisfaction. Importantly, controlling for SES and bullying experiences significantly altered these associations. For boys, the relationship between SMU and life satisfaction became statistically non-significant after confounder control. For girls, however, the inverted J-shaped association persisted, albeit with reduced magnitudes. Girls with no SMU and those with over 5 h of daily usage reported the lowest life satisfaction, with differences of up to 0.32 scale points compared to moderate users. However, country-specific analyses reveal considerable heterogeneity, indicating that this inverted J-shaped pattern is not universal. Consequently, pooled estimates should be interpreted with caution as they mask significant variation across national contexts. Conclusion This study demonstrates a gendered and highly context-dependent association between SMU and adolescent well-being, underscoring the critical importance of controlling for confounders. The most robust finding across diverse national contexts is the negative association between excessive use (over 5 h daily) and life satisfaction, particularly for girls. These findings highlight the necessity of gender-specific considerations in research and interventions aiming to promote adolescent life satisfaction in the digital age, particularly addressing the vulnerability of girls to excessive social media engagement.
- Research Article
- 10.1186/s40536-026-00279-w
- Feb 17, 2026
- Large-scale Assessments in Education
- Khalid Almamari
Abstract Background Student achievement is shaped by family background, gender, migration status, and school context, yet little research has compared how these factors operate across distinct world regions. This study examines how parental education, educational resources, study supports, gender, and migration status predict Grade 8 mathematics and science achievement in the Gulf Cooperation Council (GCC) and Asia–Pacific regions. Methods Using TIMSS 2023 data from twelve countries—six GCC (Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, United Arab Emirates) and six Asia–Pacific systems (Chinese Taipei, Hong Kong SAR, Japan, Korea, Malaysia, Singapore)—we estimated multilevel models separately for each country following a four-step specification. Models incorporated student gender, immigrant status, parental education, home educational resources, home study supports, and student-level interaction terms. Results Patterns differed across regions. In the GCC, girls consistently outperformed boys—especially in science—and immigrant students often outscored native-born peers., In Asia–Pacific countries, gender gaps were small and typically domain-specific, and immigrant performance varied. Parental education persisted as a robust predictor of higher achievement across all systems. Greater home educational resources were associated with higher achievement across countries, while study supports benefited students unevenly, with clearer advantages for girls in several GCC systems. Interaction effects indicated that students with both higher parental education and richer home resources experienced the largest achievement advantages. School-level variance was notably higher in GCC countries, reflecting greater stratification. Conclusions The findings highlight that gender, migration status, and family background do not operate uniformly but are shaped by regional opportunity structures and educational environments. GCC systems may benefit from reducing school-level disparities and strengthening supports for boys’ academic engagement, whereas Asia–Pacific systems may prioritize addressing family-level inequalities and immigrant integration. The study provides region-specific insights for promoting equity in diverse educational systems.
- Research Article
- 10.1186/s40536-026-00280-3
- Feb 17, 2026
- Large-scale Assessments in Education
- Morten Rasmus Augestad-Puck + 1 more
Abstract When measuring a phenomena across different grades it is important that the instrument measures the same way by being perceived in the same manner no matter what grade. Especially if the phenomena is a concept that should be universal and therefore independent of other circumstances, such as age. To the best of all worlds, the phenomena of enjoyment of reading should thus be universal and measure the same across grades through its measurement of intrinsic motivation of reading. Inspired by the ILSA study PIRLS 2021 a scale of enjoyment of reading was created and examined for DIF across ages and evaluated by IRT in a sample of all pupils in the public-school system in one municipality in Denmark (grade 0.–9., age 6–16). The analysis shows DIF in items, and the DIF cluster to be similar for younger pupils (0.–2. grade, age 6–8) and the older pupils (3.–9. grade, age 9–16), which indicates that the measurement is different for the two groups. The measurement of enjoyment of reading is thus not universal, as our results indicate that grades 0–2 should not be included or that one should be aware of the potential issues related to these first grades.
- Research Article
- 10.1186/s40536-026-00282-1
- Feb 17, 2026
- Large-scale Assessments in Education
- Jaekyung Lee + 1 more
Abstract Background This comparative study provides cross-cultural insights into key protective factors for disadvantaged students’ academic resilience in Korea and South Africa. Grounded in the Ecological Systems Theory and the Developmental Assets Framework, the goal of this study is to learn lessons from the comparison of resilient vs. non-resilient students in both nations and inform evidence-based policies towards asset-based pathways to academic resilience. Methods Mixed research methods are used: (1) statistical analyses of TIMSS 2019 8th grade (Korea) and 9th grade (South Africa) math assessment/survey databases and (2) a case study of resilient vs. non-resilient students in South Africa. It examines both between-country and within-country inequalities of adversities (risk factors) and assets (protective factors). The Oaxaca-Blinder decomposition analyses of math achievement gaps reveal both endowment effects (i.e., differences in assets) and parameter effects (i.e., differences in the associations between assets and achievement). Results Findings show that South Africa lags behind Korea in math achievement not only due to higher adversities and lower assets but also due to more negative adversity effects (i.e., greater risk vulnerability) and less positive asset effects (i.e., lower returns on assets). Findings also show within-country differences between resilient and non-resilient student groups. In South Africa, resilient students perform better by having not only more assets but also better utilization of assets (i.e. stronger asset-achievement relations). In contrast, Korean resilient students have more assets, but they do not show stronger asset effects than their non-resilient counterparts. While both internal and external assets contribute to resilience, internal assets (e.g., learning motivation, confidence, and efforts) are the stronger differentiator between resilient and non-resilient groups in this sample. Conclusions Educational policy implications are discussed to measure and develop unrealized potential among disadvantaged students, specifically asset-driven pathways for academic resilience. This study calls for further research on culturally-responsive assessment of both risk and protective factors in large-scale international assessments.
- Research Article
- 10.1186/s40536-026-00286-x
- Feb 17, 2026
- Large-scale Assessments in Education
- Yue Li + 1 more
Abstract Mediation analysis in large-scale assessments often involves a multilevel structure, where students are nested within classrooms or schools. In such a context, multilevel structural equation modeling (MSEM) provides a flexible framework for estimating and testing the mediation process. Plausible values (PVs), however, present unique challenges for mediation analysis in large-scale assessments, yet methodological guidance remains limited. In particular, standard pooling procedures complicate the inference of indirect effects, which relies on the construction of confidence intervals. To address these gaps, we conducted a Monte Carlo simulation study comparing three modeling methods (aggregation, two-step approach, and MSEM) and three confidence interval methods (delta, distribution of the product, and Monte Carlo) in the context of 2–2–1 mediation with PVs. We evaluated their performance in terms of relative bias, confidence interval coverage, and power across a range of realistic conditions. Simulation results suggest that the MSEM-Monte Carlo combination performs best when sample size requirements were met. An empirical example is also provided to illustrate the practical implementation of 2–2–1 mediation analysis with PVs.
- Research Article
- 10.1186/s40536-026-00283-0
- Feb 17, 2026
- Large-scale Assessments in Education
- Piia Lehtola + 3 more
Abstract In this article, we examined the construct-related validity of the motivational measures included in TIMSS 2019 of Grade 8 in three Nordic countries— Finland, Sweden, and Norway. We utilized confirmatory factor analysis (CFA) to create three factors from the items of TIMSS Students’ Attitudes Toward Mathematics scales: (1) Students Confident in Mathematics, (2) Students Like Learning Mathematics, and (3) Students Value Mathematics. The model with three factors fitted well in the combined data from the three Nordic countries, but the association with mathematics achievement revealed a suppression effect and/or multicollinearity problems with the factors, which limits the interpretations of the results. Thus, we also tested an alternative model comprising three first-order factors and one second-order factor. We named this second-order factor “Motivation.” The association between the second-order factor and mathematics achievement was found to be strong and positive ( = 0.64, p < 0.001). In addition, we examined the measurement invariance of the second-order factor in the Nordic context with multigroup CFA. There was metric, but not scalar, invariance among Finland, Norway, and Sweden. This implies that the association between the factors and other variables can be compared among the three Nordic countries, but the mean level of motivation among these countries cannot be compared without caution. These findings emphasize the importance of acknowledging the validity and comparability of the measures utilized in conducting secondary analyses from data of international large-scale assessments to obtain interpretable results and meaningful conclusions.