Testing beyond accountability: a policy-instrument approach to the evolution and changing uses of large-scale assessments
ABSTRACT National large-scale assessments (NLSAs) have become key monitoring instruments in education systems worldwide, yet their policy roles are often examined primarily through the lens of test-based accountability. This paper makes the case for analysing NLSAs as policy instruments in their own right – considering their accountability implications, but not reducing them to such purposes – to better understand their diverse functions and effects. Drawing on policy instruments and governance literatures, we propose a conceptual framework that distinguishes three interrelated properties – intensity, consequentiality, and centrality – that shape how NLSAs operate within evaluation, monitoring, and assessment systems. The framework is applied to the cases of Argentina, Chile, and Mexico to illustrate how these properties evolve along divergent trajectories and interact with broader policy mixes. In doing so, the paper advances a comparative approach to NLSAs that recognises their embeddedness in complex policy environments and deepens our understanding of their changing roles and functions within educational governance.
- Research Article
173
- 10.1080/01596306.2019.1569882
- Jan 21, 2019
- Discourse: Studies in the Cultural Politics of Education
ABSTRACTIn the last decades, most countries have adopted data-intensive policy instruments aimed at modernizing the governance of education systems, and strengthening their competitiveness. Instruments such as national large-scale assessments and test-based accountabilities have disseminated widely, to the point that they are being enacted in countries with very different administrative traditions and levels of economic development. Nonetheless, comparative research on the trajectories that governance instruments follow in different institutional and socio-economic contexts is still scarce. On the basis of a systematic literature review (n = 158), this paper enquires into the scope and modalities of educational governance change that national large-scale assessments and test-based accountability instruments have triggered in a broad range of institutional settings. The paper shows that, internationally, educational governance reforms advance through path-dependent and contingent processes of policy instrumentation that are markedly conditioned by prevailing politico-administrative regimes. The paper also reflects on the additive and evolving nature of educational governance reforms.
- Research Article
6
- 10.1007/s11092-015-9232-7
- Dec 1, 2015
- Educational Assessment, Evaluation and Accountability
This paper presents a single-country case study of the use of large scale assessment (LSA) data to generate actionable knowledge at school and system levels. Actionable knowledge is data-informed insight into school and system processes that can be used to direct corrective action. The analysis is framed from the perspective of the country’s evolving national policy on data use for educational improvement between 1990 and 2013. Trinidad and Tobago first participated in international large scale assessments (ILSAs) in 1991 but also developed a centralized system of national large scale assessments (NLSAs) in 2004. Analyses of both datasets consistently pointed to low quality and high inequality as the main actionable issues in the education system. NLSA data also hinted at notable variation in performance across schools and education districts. Analyses for and of policy point to the need for multiple school performance measures to better inform site-based, formative action. Over the period, actionable knowledge appears to have had greater impact at school level, with evidence being used by some low-performing schools to improve. However, at the system level, the frequent non-use and misuse of actionable knowledge suggest the need to promote and strengthen structures and processes related to evidence-informed policy-making.
- Research Article
5
- 10.14507/epaa.31.7323
- Jun 13, 2023
- Education Policy Analysis Archives
Nationwide large-scale assessments (NLSA)—an example of cross-border policy mobility—manifest a proliferating means of governing formal schooling. In the Russian context, NLSA takes the form of a compulsory graduation examination called the Unified State Examinations (USE). In this article, we explore how a mobile policy instrument of the NLSA participates in the relational processes of time- and space-making in a particular federated context of Russia, and how this process intertwines with and is shaped by the presence of multiple time zones. We argue that NLSA is an instrument of time that attempts to achieve centralization of the complex federated structure of the Russian Federation. Yet, the work of the NLSA is not a smooth process in a country characterized by territorial vastness, a complex federated structure, and the existence of multiple time zones. Guided by the theory of logistical power and sociological perspectives on time, as well as empirical insights, we show how the time zones need to be tamed in order for the NLSA to exercise its centralizing role. Discursively, the time zone is introduced and publicly discussed to symbolically characterize Russia and justify political actions or their outcomes. Bureaucratically, the desire for simultaneity and synchronicity takes the form of a meticulous ordering of a sequence of actions through prescriptive documentation that regulates the NLSA. Technologically, synchronicity, simultaneity, and instantaneousness rely on and engender an expanding national infrastructure that mediates social relations and the processes of conducting the NLSA, cutting across the time zones and federal units. Based on this analysis, we propose that scholarship on policy mobility and education policy sociology at large could benefit from examining the relationship between time and education policy and governance in four intertwined ways: the time of policy, context as time, policy instruments as instruments of time, and time in policy instruments.
- Research Article
186
- 10.1080/00131911.2019.1522045
- Oct 5, 2018
- Educational Review
ABSTRACTThe Global Education Reform Movement (GERM) is expanding internationally and reaching countries that seemed to be immune to this education reform approach until quite recently. Accordingly, more and more educational systems in the world are articulated around three main policy principles: accountability, standards and decentralisation. National large-scale assessments (NLSAs) are a core component of the GERM; these assessments are increasingly used for accountability purposes as well as to ensure that schools achieve and promote centrally defined and evaluable learning standards. In this paper, we explore these trends on the basis of a new and original database on NLSAs, as well as on data coming from the Programme for International Student Assessment (PISA) questionnaires. In the paper we also discuss how different theories on policy dissemination/globalisation explain the international spread of NLSAs and test-based accountability worldwide, and reflect on the potential of a political sociology approach to analyse this globalising phenomenon.
- Book Chapter
63
- 10.1007/978-94-007-4629-9_7
- Jul 28, 2012
Policy makers are mainly interested in large-scale assessments as indicators that monitor the functioning, productivity, and equity of educational systems, while researchers tend to perceive large-scale assessments as a kind of multigroup (i.e., multicountry) educational effectiveness study. Aside from describing strengths and challenges with regard to student performance and the conditions of teaching and schooling in participating countries, researchers also want to understand why students achieve certain levels of performance. But because large-scale assessments provide only observational data, it is exceedingly difficult to draw causal inferences, such as concluding that a particular educational policy or practice has a direct or indirect impact on student performance. A productive interplay between large-scale assessments and effectiveness research may be established in several ways by implementing enhancements to the assessment design. Two examples of such enhancements will be presented and discussed: (1) a national large-scale assessment on language competencies in Germany reassessed students one year after the first large-scale assessment, allowing researchers to study the impact of school-level factors on classroom instruction and student growth; and (2) a reassessment of Germany’s schools performed nine years after initial participation in PISA.
- Research Article
- 10.3389/feduc.2025.1565557
- May 15, 2025
- Frontiers in Education
In large-scale assessments, the collection of high-quality representative data depends in part on (i) securing high participation rates and (ii) ensuring that participants demonstrate a sufficient level of engagement. This article explores the challenges of promoting participation and engagement with reference to Ireland’s experiences throughout multiple cycles of international and national large-scale assessments. Some factors likely to have influenced participation and/or engagement in the Irish context include: (i) the rising profile of large-scale assessments due to their prominence within a national strategy to improve literacy and numeracy; (ii) the transition of studies from paper-based to digital administration; (iii) the publication of new data protection legislation (both European and national); and (iv) pressures resulting from the COVID-19 pandemic. Initiatives implemented by Ireland’s national study centre with the aim of promoting participation in and engagement with large-scale assessments can be broadly classified as relating to consultation, promotion, and support. Lessons from Ireland’s experiences are discussed in relation to future avenues of investigation that may increase our understanding of facilitators and barriers to engagement. Ireland’s experiences offer valuable lessons that could inform practices in other countries.
- Research Article
4
- 10.1080/00131911.2023.2256996
- Sep 19, 2023
- Educational Review
Large-scale assessments have become a basic national policy for educational improvement encouraging standards, decentralisation and school accountability. The current study focuses on the pedagogical dimension of large-scale assessments, examining its uses as a policy instrument for effecting pedagogical change. The paper presents and discusses the case of the Israeli NLSA (national large-scale assessment) regime – the Meitzav (Hebrew acronym for: Growth and Effectiveness Measures for Schools) tests. Although it aimed to design a low-stakes testing regime, its implementation was a top-down procedure which, in practice, restricts principals’ and teachers’ autonomy. Using qualitative and quantitative methods, the findings showed that the Meitzav test results are barely used as a means leading to pedagogical change to improve learning. Teachers considered these tests an unreliable assessment tool that in the main, does not reflect the school curriculum or student learning, while producing a high level of pressure on the teaching routine. In consequence of the Meitzav test results, the most common pedagogical change in practice chosen by teachers was: “teaching to the test”. Other pedagogical changes following the Meitzav were implemented to a minor extent. Policy implications are discussed.
- Research Article
9
- 10.1086/710767
- Nov 1, 2020
- Comparative Education Review
National large-scale assessments (NLSA) have spread rapidly since the 1990s, but contrary to arguments of diffusion theories, not every country with large-scale assessments develop test-based accou...
- Single Book
69
- 10.4324/9781410605115
- Dec 6, 2012
Contents: Preface. G. Tindal, Large-Scale Assessments for All Students: Issues and Options. Part I: Validity Issues. R.L. Linn, Validation of the Uses and Interpretations of Results of State Assessment and Accountability Systems. R. Gersten, S. Baker, The Relevance of Messick's Four Faces for Understanding the Validity of High-Stakes Assessments. J.M. Ryan, S. DeMark, Variation in Achievement Scores Related to Gender, Item Format, and Content Area Tested. T.M. Haladyna, Supporting Documentation: Assuring More Valid Test Score Interpretations and Uses. S.E. Phillips, Legal Issues Affecting Special Populations in Large-Scale Testing Programs. W.A. Mehrens, Consequences of Assessment: What Is the Evidence. Part II: Technical Issues. R. Tate, Test Dimensionality. M.C. Rodriguez, Choosing an Item Format. C.S. Taylor, Incorporating Classroom-Based Assessments Into Large-Scale Assessment Programs. G. Engelhard, Jr., Monitoring Raters in Performance Assessments. J.M. Ryan, Issues, Strategies, and Procedures for Applying Standards When Multiple Measures Are Employed. S.W. Choi, M. McCall, Linking Bilingual Mathematics Assessments: A Monolingual IRT Approach. Part III: Implementation Issues. P.J. Almond, C. Lehr, M.L. Thurlow, R. Quenemoen, Participation in Large-Scale State Assessment and Accountability Systems. R.P. Duran, C. Brown, M. McCall, Assessment of English-Language Learners in the Oregon Statewide Assessment System: National and State Perspectives. K. Hollenbeck, Determining When Test Alterations Are Valid Accommodations or Modifications for Large-Scale Assessment. R. Helwig, A Methodology for Creating an Alternative Assessment System Using Modified Measures. M.L. Thurlow, J. Bielinski, J. Minnema, J. Scott, Out-of-Level Testing Revisited: New Concerns in the Era of Standards-Based Reform. J. Ysseldyke, J.R. Nelson, Reporting Results of Student Performance on Large-Scale Assessments. Part IV: Epilogue. T.M. Haladyna, Research to Improve Large-Scale Testing.
- Research Article
23
- 10.1016/j.stueduc.2020.100847
- Feb 18, 2020
- Studies in Educational Evaluation
Measuring mathematics competence in international and national large scale assessments: Linking PISA and the national educational panel study in Germany
- Book Chapter
5
- 10.1075/swll.17.09sto
- Jul 16, 2018
As screen reading becomes the new standard, valid measures for capturing the defining features of reading ability as it moves from paper to screens must be developed. With the ongoing digitisation of many international and national large scale assessments, questions about the role of testing mode become especially pertinent. This chapter explores the question of how testing mode impacts the design of digital reading tests as well as children’s performance on them. We discuss how findings from empirical research on mode effects can inform the design of reading assessment and consider the pedagogical implications of a move to digital assessment.
- Research Article
131
- 10.2304/rcie.2013.8.3.387
- Jan 1, 2013
- Research in Comparative and International Education
The Annual Status of Education Report (ASER) is a national citizen-led rapid assessment of children's ability to read simple text and do basic arithmetic. ASER is designed and facilitated by the Indian nongovernment organisation Pratham, and has been conducted every year since 2005 by partner organisations in every rural district of India, reaching more than 600,000 children annually. The assessment differs from most other international and national large-scale assessments in several key respects, such as the use of household rather than school-based sampling and the focus on simple tools and indicators that are easy to administer and understand. All ASER metrics, measures and processes are intended to engage ordinary citizens in thinking about and acting to improve basic learning outcomes in India. By conducting a massive national survey each year, ASER has demonstrated that it is possible to use simple, reliable and scientific methods of sampling and assessment on a large scale for high impact at a very low cost. Key to this aspect of ASER has been its ability to mobilise over 25,000 volunteers each year. ASER has been responsible to a large extent for putting the issue of learning on the agenda in India. More recently, the model has been adapted for use in several African and Asian countries. Taken together, these initiatives reached more than a million children in 2012.
- Research Article
11
- 10.12738/estp.2016.6.0227
- Jan 1, 2016
- Educational Sciences: Theory & Practice
(ProQuest: ... denotes formulae omitted.)While education provides individuals an opportunity for cognitive, social and emotional development, maintains and creates social stratification (Montt, 2011). Socioeconomic factors, along with school- and community-related factors, cause inequalities in education. For instance, gender (being female) and wealth (being poor) are major obstacles to school enrollment and achievement (Filmer, 2005; Nguyen, 2006). In some countries, females still do not have the same opportunities as males. Females do not have equal access to education and have lower academic achievement than their male peers (Nguyen, 2006). The literature has pointed to socioeconomic-related educational inequalities in underdeveloped (Grimm, 2011) and developing countries (Martins & Veiga, 2010). Students who grow up in families of low status (SES), specifically, those whose parents have low levels of education, low incomes, or low-prestige occupations, generally show slower cognitive development than students whose parents have high SES (Gamboa & Waltenberg, 2012; Hertzman, 1994; Hertzman & Weins, 1996). This can be explained with Bourdieu's cultural capital reproduction theory (1986), which holds that socioeconomic inequalities in education persist because highly educated parents give their children a better understanding of the dominant culture and an ability to act within it (p. 1017, as cited in Martins & Vegia, 2010). Identifying the sources of inequalities in educational attainment and achievement and, reducing their effects are major concerns of educational researchers and policymakers worldwide.School variables such as school culture, resources (e.g., books, teacher-student ratio), and the composition of a school can create learning situation that exacerbate imbalances in student achievement across schools (Baker, Goesling, & Letendre, 2002; Thrupp, Lauder, & Rabinson, 2002). The Equality of Educational Opportunity Study (Coleman et al., 1966), also known as the Coleman Study, surprised many researchers and policymakers, not only in the United States but throughout the world; showed that, rather than the school itself, was the and ethnic background of families that constituted the source of variation in achievement. Since then, much research has been conducted to verify whether these findings hold true in other countries as well. Some studies conclude that the impact of schools on student achievement is small in wealthy countries but relatively strong in poorer countries (Buchmann, 2002; Fuller & Clarke, 1994; Heyneman & Loxley, 1983). Other findings, however, are consistent with those of the Coleman Study (Baker et al., 2002).Beyond the acknowledged effect of family factors on school participation, the influence of school- and community-related factors on student achievement remains essentially unexplored. Binder (1999) and numerous other researchers have drawn attention to this gap in the literature. Few studies have been conducted in Turkey to examine school and family effects on school outcomes such as achievement on national large-scale assessment and the Program for International Student Assessment (PISA) (Alacaci & Erbac, 2010; Dincer & Uysal, 2010; Guncer & Kose, 1993; Tomul & Savacci, 2010). Existing Turkish studies have their own methodological limitations. For instance, they used data collected by the Program for International Student Assessment (PISA), which raises concerns about the validity of test results, sample coverage and representativeness (Ferreira & Gignoux, 2011). Pisa Sampling coverage rate is below 50% and no information about non-participant students is available (Carvalho, Gamboa, & Waltenberg, 2012).The purpose of this study is to explore student and school factors that contribute to inequalities in seventh- and eighth-grade student achievement in the Turkish context by using national achievement test scores. …
- Research Article
1
- 10.1108/qae-04-2022-0098
- Dec 29, 2022
- Quality Assurance in Education
PurposeLarge-scale assessment has been used in many education systems as an instrument to evaluate educational performance nationally. This practice is based on the concept of epistemic governance which encourages school accountability. This study aims to explore teachers' perspectives regarding the value and uses of national large-scale assessment (NLSA), highlighting its relevance across contexts.Design/methodology/approachUsing qualitative data, this paper presents the case of the Israeli NLSA tests – the Meitzav, while examining the perceptions and actions in which teachers engage to follow-up on the test results, and the extent to which they implement pedagogical change in light of the test results.FindingsThe findings showed that teachers tend to use the NLSA test results as a pedagogical tool to improve learning processes to a limited extent. They concede that most activity involving the tests at the school and class levels is dedicated to preparation and not to pedagogical change. Some explanations are suggested.Originality/valueThis paper discusses the theoretical and practical implications of the NLSA testing regime for the school, curriculum and pedagogy.
- Research Article
42
- 10.1111/1467-8489.00052
- Sep 1, 1998
- Australian Journal of Agricultural and Resource Economics
In recent years reducing the amount of waste generated by households has become an important policy issue in industrialised economies. It is no longer acceptable to discard waste without concern for environmental and natural resource issues. In an effort to reduce household waste various policy instruments such as kerbside charges, deposit‐refund schemes, integrated sales tax exemptions and virgin material taxes, have been proposed and/or implemented. This article reviews the economics literature that has addressed household waste management. It is argued that a comprehensive modelling framework is necessary if the complex policy environment is to be accurately described.