Abstract

Increasing class size is one of the key variables that policy makers can use to control spending on education. The average class size at the lower secondary level is 23 students in OECD countries, but there are significant differences, ranging from over 32 in Japan and Korea to 19 or below in Estonia, Iceland, Luxembourg, Slovenia and the United Kingdom (OECD, 2012). On the other hand, reducing class size to increase student achievement is an approach that has been tried, debated, and analysed for several decades. Between 2000 and 2009, many countries invested additional resources to decrease class size (OECD, 2012). Despite the important policy and practice implications of the topic, the research literature on the educational effects of class-size differences has not been clear. A large part of the research on the effects of class size has found that smaller class sizes improve student achievement (for example Finn & Achilles, 1999; Konstantopoulos, 2009; Molnar et al., 1999; Schanzenbach, 2007). The consensus among many in education research that smaller classes are effective in improving student achievement has led to a policy of class size reductions in a number of U.S. states, the United Kingdom, and the Netherlands. This policy is disputed by those who argue that the effects of class size reduction are only modest and that there are other more cost-effective strategies for improving educational standards (Hattie, 2005; Hedges, Laine, & Greenwald, 1994; Rivkin, Hanushek, & Kain, 2005). There is no consensus in the literature as to whether class size reduction can pass a cost-benefit test (Dustmann, Rajah & van Soest, 2003; Dynarski, Hyman & Schanzenbach, 2011; Finn, Gerber & Boyd-Zaharias, 2005; Muenning & Woolf, 2007). As it is costly to reduce class size, it is important to consider the types of students who might benefit most from smaller class sizes and to consider the timing, intensity, and duration of class size reduction as well. Low socioeconomic status is strongly associated with low school performance. Results from the Programme for International Student Assessment (PISA) point to the fact that most of the students who perform poorly in PISA are from socio-economically disadvantaged backgrounds (OECD, 2010). Across OECD countries, a student from a more socio-economically advantaged background outperforms a student from an average background by about one year's worth of education in reading, and by even more in comparison to students with low socio-economic background. Results from PISA also show that some students with low socioeconomic status excel in PISA, demonstrating that overcoming socio-economic barriers to academic achievement is indeed possible (OECD, 2010). Smaller class size has been shown to be more beneficial for students from socioeconomically disadvantaged backgrounds (Biddle & Berliner, 2002). Evidence from the Tennessee STAR randomised controlled trial showed that minority students, students living in poverty, and students who were educationally disadvantaged benefitted the most from reduced class size (Finn, 2002; Word et al. (1994). Further, evidence from the controlled, though not randomised, trial, the Wisconsin's Student Achievement Guarantee in Education (SAGE) program, showed that students from minority and low-income families benefitted the most from reduced class size (Molnar et al., 1999). Thus, rather than implementing costly universal class size reduction policies, it may be more economically efficient to target schools with high concentrations of socioeconomic disadvantaged students for class size reductions. In the case of the timing of class size reduction, the question is: when does class size reduction have the largest effect? Ehrenberg, Brewer, Gamoran and Willms (2001) hypothesized that students educated in small classes during the early grades may be more likely to develop working habits and learning strategies that enable them to better take advantage of learning opportunities in later grades. According to Bascia and Fredua-Kwarteng (2008), researchers agree that class size reduction is most effective in the primary grades. That empirical research shows class size to be most effective in the early grades is also concluded by Biddle and Berliner (2002) and the evidence from both STAR and SAGE back this conclusion up (Finn, Gerber, Achilles, & Boyd-Zaharias, 2001; Smith, Molnar, & Zahorik, 2003). Of course, there is still the possibility that smaller classes may also be advantageous at later strategic points of transition, for example, in the first year of secondary education. Research evidence on this possibility is, however, needed. For intensity, the question is: how small does a class have to be in order to optimize the advantage? For example, large gains are attainable when class size is below 20 students (Biddle & Berliner, 2002; Finn, 2002) but gains are also attainable if class size is not below 20 students (Angrist & Lavy, 2000; Borland, Howsen & Trawick, 2005; Fredrikson, Öckert & Oosterbeek, 2013; Schanzenbach, 2007). It has been argued that the impact of class size reduction of different sizes and from different baseline class sizes is reasonably stable and more or less linear when measured per student (Angrist & Pischke, 2009, see page 267; Schanzenbach, 2007). Other researchers argue that the effect of class size is not only nonlinear but also non-monotonic, implying that an optimal class size exists (Borland, Howsen & Trawick, 2005). Thus, the question of whether the size of reduction and initial class size matters for the magnitude of gain from small classes is still an open question. Finally, researchers agree that the length of the intervention (number of years spent in small classes) is linked with the sustainability of benefits (Biddle & Berliner, 2002; Finn, 2002; Grissmer, 1999; Nye, Hedges & Konstantopoulos, 1999) whereas the evidence on whether more years spent in small classes leads to larger gains in academic achievement is mixed (Biddle & Berliner, 2002; Egelson, Harman, Hood & Achilles, 2002; Finn 2002; Kruger, 1999). How long a student should remain in a small class before eventually returning to a class of regular size is an unanswered question. The intervention in this systematic review is a reduction in class size. What constitutes a reduced class size? This seemingly simple issue has confounded the understanding of outcomes of the research and it is one of the reasons there is disagreement about whether class size reduction works (Graue, Hatch, Rao & Oen, 2007). Two terms are used to describe the intervention, class size and student-teacher ratio, and it is important to distinguish between these two terms. The first, class size, focuses on reducing group size and, hence, is operationalized as the number of students a teacher instructs in a classroom at a point in time. For this definition, a reduced number of students are assigned to a class in the belief that teachers will then develop an in-depth understanding of student learning needs through more focused interactions, better assessment, and fewer disciplinary problems. These mechanisms are based on the dynamics of a smaller group (Ehrenberg et al., 2001). The second term is student-teacher ratio and is often used as a proxy for class size, defined as a school's total student enrollment divided by the number of its full time teachers. From this perspective, lowering the ratio of students to teachers provides enhanced opportunities for learning. The concept of using student-teacher ratios as a proxy for class size is based on a view of teachers as units of expertise and is less focused on the student-teacher relationship. Increasing the relative units of expertise available to students increases learning, but does not rely on particular teacher-student interactions (Graue et al., 2007). Although class size and student-teacher ratio are related, they involve different assumptions about how a reduction changes the opportunities for students and teachers. In addition, the discrepancy between the two can vary depending on teachers' roles and the amount of time teachers spend in the classroom during the school day. In this review, the intervention is class size reduction. Studies only considering average class size measured as student-teacher ratio at school level (or higher levels) will not be eligible. Neither will studies where the intervention is the assignment of an extra teacher (or teaching assistants or other adults) to a class be eligible. The assignment of additional teachers (or teaching assistants or other adults) to a classroom is not the same as reducing the size of the class, and this review focuses exclusively on the effects of class size in the sense of number of students in a classroom. Smaller classes allow teachers to adapt their instruction to the needs of individual students. For example, teachers' instruction can be more easily adapted to the development of the individual students. The concept of adaptive education refers to instruction that is adapted to meet the individual needs and abilities of students (Houtveen, Booij, de Jong & van de Grift, 1999). With adaptive education, some students receive more time, instruction, or help from the teacher than other students. Research has shown that in smaller classes, teachers have more time and opportunity to give individual students the attention they need (Betts & Shkolnik, 1999; Blatchford & Mortimore, 1994; Bourke, 1986; Molnar et al., 1999; Molnar et al., 2000; Smith & Glass, 1980). Additional, less pressure may be placed upon the physical space and resources within the classroom. Both of these factors may be connected to less pupil misbehaviour and disciplinary problems detected in larger classes (Wilson, 2002). In smaller classes, it is possible for students with low levels of ability to receive more attention from the teacher, with the result that not necessarily all students profit equally. More generally, teachers are able to devote more of their time to educational content (the tasks students must complete) and less to classroom management (for example, maintaining order) in smaller classes. An increased amount of time spend on task, contributes to enhanced academic achievement. It has often been pointed out, however, that teachers do not necessarily change the way they teach when faced with smaller classes and therefore do not take advantage of all of the benefits offered by a smaller class size. Research suggests that such situations do indeed exist in practice (e.g. Blatchford & Mortimore, 1994; Shapson, Wright, Eason & Fitzgerald, 1980). Anderson (2000) addressed the question of why reductions in class size should be expected to enhance student achievement and part of his theory was tested in Annevelink, Bosker and Doolaard (2004). To explain the relationship between class size and achievement, Anderson developed a causal model, which starts with reduced class size and ends with student achievement. Anderson noted that small classes would not, in and of themselves, solve all educational problems. The number of students in a classroom can have only an indirect effect on student achievement. As Zahorik (1999) states: “Class size, of course, cannot influence academic achievement directly. It must first influence what teachers and students do in the classroom before it can possibly affect student learning” (p. 50). In other words, what teachers do matters. Anderson's causal model of the effect of reduced class size on student achievement is depicted in Figure 1. Anderson's model predicts that a reduced class size will have direct positive effects on the following three variables: 1) Disciplinary problems, 2) Knowledge of student, and 3) Teacher satisfaction and enthusiasm. Each of these variables, in turn, begins a separate path. Fewer disciplinary problems are expected to lead to more instructional time, which in combination with teacher knowledge of the external test, produces greater opportunity to learn. In combination with more appropriate, personalised instruction and greater teacher effort, more instructional time potentially produces greater student engagement in learning as well as more in-depth treatment of content. Greater knowledge of students is expected to provide more appropriate personalised instruction, and in combination with more instructional time and greater teacher effort, potentially produces greater student engagement in learning and more in-depth treatment of content. Greater teacher satisfaction and enthusiasm are expected to result in greater teacher effort, which in combination with more instructional time and more appropriate, personalised instruction produces greater student engagement in learning and more in-depth treatment of content. Finally greater student achievement is the expected result of a combination of the three variables: Greater opportunity to learn, greater student engagement in learning, and more in-depth treatment of content. The path from greater knowledge of students through appropriate, personalised instruction and student engagement in learning to student achievement is tested in Annevelink et al. (2004) on students in Grade 1 in 46 Dutch schools in the school year 1999-2000. Personalised instruction is operationalised as the number of specific types of interactions. Teachers seeking to provide more personalised instruction are expected to provide fewer interactions directed at the organization and personal interactions, and more interactions directed at the task and praising interactions. These changes in interactions are expected to result in a situation where the student spends more time on task. The level of student engagement is operationalised as the amount of time a student spends on task. Students who spend more time on task are expected to achieve higher learning results. Smaller classes were related to more interactions of all kinds and more task-directed and praising interactions resulted in more time spent on task which in turn was related to higher student achievement as expected. Notice that more organizational or personal interactions in smaller classes were contrary to expectations whereas more task-directed interactions or praising interactions was consistent with expectations (Annevelink et al., 2004). Class size is one of the most researched educational interventions in social science, yet there is no clear consensus on the effectiveness of small class sizes for improving student achievement. While one strand of class size research points to small and insignificant effects, another points to positive and significant effects. The early meta-analysis by Glass and Smith (1979) analysed the outcomes of 77 studies including 725 comparisons between smaller and larger class sizes on student achievement. They concluded that a class size reduction had a positive effect on student achievement. Hedges and Stock (1983) reanalysed Glass and Smith's data using different statistical methods, but found very little difference in the average effect sizes across the two analysis methods. However, the updated literature reviews by Hanushek (Hanushek, 1989; 1999; 2003) cast doubt on these findings. His reviews looked at 276 estimates of pupil-teacher ratios as a proxy for class size, and most of these estimates pointed to insignificant effects. Based on a vote counting method, Hanushek concluded that “there is no strong or consistent relationship between school resources and student performance” (Hanushek, 1987, p. 47). Krueger (2003), however, points out that Hanushek relies too much on a few studies, which reported many estimates from even smaller subsamples of the same dataset. Many of the 276 estimates were from the same dataset but estimated on several smaller subsamples, and these many small sample estimates are more likely to be insignificant. The vote counting method used in Hanushek's original literature review (Hanushek, 1989) is also criticised by Hedges et al. (1994), who offer a reanalysis of the data from Hanushek's reviews using more sophisticated synthesis methods. Hedges et al. (1994) used a combined significance test.1 They tested two null hypotheses: 1) no positive relation between the resource and output and 2) no negative relation between the resource and output. The tests determine if the data are consistent with the null hypothesis in all studies or false in at least some of the studies. Further, Hedges et al. (1994) reported the median standardized regression coefficient.2 The conclusion is that “it shows systematic positive relations between resource inputs and school outcomes” (Hedges et al., 1994, p. 5). Hence, dependent upon which synthesis method3 is considered appropriate; conclusions based on the same evidence are quite different. The divergent conclusions of the above-mentioned reviews are further based on non-experimental evidence, combining measurements from primary studies that have different specifications and assumptions. According to Grissmer (1999), the different specifications and assumptions, as well as the appropriateness of the specifications and assumptions, account for the inconsistency of the results of the primary studies. The Tennessee STAR experiment provides rare evidence of the effect of class size from a randomized controlled trial (RCT). The STAR experiment was implemented in Tennessee in the 1980s, assigning kindergarten children to either normal sized classes (around 22 students) or small classes (around 15 students). The study ran for four years, until the assigned children reached third grade, but not even based on this kind of evidence do researchers agree about the conclusion. According to Finn and Achilles (1990), Nye et al. (1999) and Krueger (1999), STAR results show that class size reduction increased student achievement. However, Hanushek (1999; 2003) questions these results because of attrition from the project, crossover between treatments, and selective test taking, which may have violated the initial randomization. While the class size debate on what can be concluded based on the same evidence is acceptable and meaningful in the research community, it is probably of less help in guiding decision-makers and practitioners. If research is to inform practice, there must be an attempt to reach some agreement about what the research does and does not tell us about the effectiveness of interventions as well as what conclusions can be reasonably drawn from research. The researchers must reach a better understanding of questions such as: for who does class size reduction have an effect? When does class size reduction have an effect? How small does a class have to be in order to be advantageous? The purpose of this review is to systematically uncover relevant studies in the literature that measure the effects of class size on academic achievement and synthesize the effects in a transparent manner. The purpose of this review is to systematically uncover relevant studies in the literature that measure the effects of class size on academic achievement. We will synthesize the effects in a transparent manner and, where possible, we will investigate the extent to which the effects differ among different groups of students such as high/low performers, high/low income families, or members of minority/non-minority groups, and whether timing, intensity, and duration have an impact on the magnitude of the effect. The title for this systematic review was approved in The Campbell Collaboration on 9. October 2012. Types of study designs We will include study designs that use a well-defined control group. The main control or comparison condition is students in classes with more students than in the treatment classes. Non-randomised studies, where the reduction of class size has occurred in the course of usual decisions outside the researcher's control, must demonstrate pre-treatment group equivalence via matching, statistical controls, or evidence of equivalence on key risk variables and participant characteristics. These factors are outlined in section Assessment of risk of bias in included studies‘ under the subheading of Confounding, and the methodological appropriateness of the included studies will be assessed according to the risk of bias model outlined in section Assessment of risk of bias in included studies.‘ Different studies use different types of data. Some use test score data on individual students and actual class-size data for each student. Others use individual student data but average class-size data for students in that grade in each school. Still others use average scores for students in a grade level within a school and average class size for students in that school. We will only include studies that use measures of class size and measures of outcome data at the individual or class level. We will exclude studies that rely on measures of class size as and measures of outcomes aggregated to a level higher than the class (e.g., school or school district). Some studies do not have actual class size data and use the average student-teacher ratio within the school (or at higher levels, e.g. school districts). Studies only considering average class size measured as student-teacher ratio within a school (or at higher levels) will not be eligible. Types of participants The review will include children in grades kindergarten to 12 (or the equivalent in European countries) in general education. Studies that meet inclusion criteria will be accepted from all countries. We will exclude children in home-school, in pre-school programs, and in special education. Types of interventions The intervention in this review is a reduction in class size. The more precise class size is measured the more reliable the findings of a study will be. Studies only considering the average class size measured as student-teacher ratio within a school (or at higher levels) will not be eligible. Neither will studies where the intervention is the assignment of an extra teacher (or teaching assistants or other adults) to a class be eligible. The assignment of additional teachers (or teaching assistants or other adults) to a classroom is not the same as reducing the size of the class, and this review focuses exclusively on the effects of reducing class size. We acknowledge that class size can change per subject or eventually vary during the day. The precision of the class size measure will be recorded. Types of outcome measures The primary focus is on measures of academic achievement. Academic achievement outcomes include reading and mathematics. Outcome measures must be standardised measures of academic achievement. The primary outcome variables are standardised literacy tests (e.g. reading, spelling and writing) and standardised numeracy tests (e.g. mathematical problem-solving, arithmetic and numerical reasoning, grade level math). Some studies may report test results in other academic subjects and/or measures of global academic performance. The following effect sizes will also be coded as secondary outcomes when available: standardised test in other academic subjects at primary school level (e.g. in science or second language) and measures of global academic performance (e.g. Woodcock-Johnson III Tests of Achievement, Stanford Achievement Test (SAT), Grade Point Average). In addition to the primary outcome, we will consider school completion rates as a secondary outcome. Studies will only be included if they consider one or more of the primary outcomes. Duration of follow-up Types of settings The location of the intervention is classes, grades kindergarten to 12 (or the equivalent in European countries) in regular private, public or boarding schools. Home-schools will be excluded. Electronic searches Relevant studies will be identified through electronic searches of bibliographic databases, research networks, government policy databanks and internet search engines. The searches will include studies published from 1980 and forward (The search dates are restricted as the results of too old studies may not be valid today. On the other hand we want to include the STAR experiment which was implemented in Tennessee in the 1980s). No language limitation is applied in the searches. Search terms An example of the search strategy for ERIC searched on the EBSCO platform is listed below. The strategy will be modified for the different databases. Both subject headings and text words will be searched. Grey literature Additional searches will be made by means of Google and Google Scholar and we will check the first 150 hits. OpenGrey (http://www.opengrey.eu/) will also be used to search for European grey literature. Copies of relevant documents will be made and we will record the exact URL and date of access for each relevant document. Copies of relevant documents from Internet-based sources will be made. We will record the exact URL and date of access. Hand searching The top two most represented journals in the database search will be hand searched. Snowballing Reference lists of included studies and relevant reviews will be searched for potential new literature. Personal contacts Personal contacts with national and international researchers will be considered in order to identify unpublished reports and on-going studies. We expect that a certain amount of studies will be conducted without randomisation of participants, since there is not a firm tradition for RCTs in educational research. This stems, among other things, from some degree of scepticism towards randomisation of participants due to ethical concerns about random allocation of services. The Tennessee STAR experiment is an exception and provides rare evidence of the effect of class size from a randomized controlled trial. The STAR experiment was implemented in Tennessee in the 1980s. A cohort of students and teachers at kindergarten through third grade were assigned at random to three types of class within the same school: a small class (around 17 students), a regular (typical) class (around 23 students), and a regular class with a teacher-aide. In fourth grade the students returned to regular classes and the experiment ended. All districts in the state were invited to participate. The sample included 128 small classes, 101 regular classes and 99 regular classes with an aide. A team based in the state originally conducted an evaluation (Word et al., 1990), but several other researchers have investigated the data as subsequent longitudinal outcome data for students in the original demonstration have been collected (for example Nye et al., 1999 and Hanushek, 1999). An example of a controlled, though not randomised, trial is the Wisconsin's Student Achievement Guarantee in Education (SAGE) program. It was designed as a 5- year pilot project that began in the 1996-97 school year. The program requires that participating schools implement four different interventions, of which one is to reduce the pupil-teacher ratio within a classroom to 15 students per teacher beginning with kindergarten and first grade in the 1996-97 school year (second grade was added in 1997-98 and third grade in 1998-99). The SAGE evaluation is based on comparisons of achievement in the 30 schools that entered the program in the autumn of 1996 and a group of 14-17 preselected comparison schools with similar student and school characteristics. Achievement tests were administered in the SAGE and comparison schools at the beginning and end of the first grade (Molnar et al., 1999). A widely used approach that tries to estimate the causal effect of class size follows the methodological development in Angrist and Lavy (2000). This method estimates the class size effect from cut-off rules in grade enrolment with a regression discontinuity design. As enrolment into a particular grade reaches the maximum class size, government regulations stipulate that schools create an additional class. If, for example, the class size maximum is 40, then enrolment of 40 students will result in one class while enrolment of 41 students will result in two classes of average size 20.5. Comparing student outcomes by small and large classes in schools with beginning-of-the-year enrolment near 40 students, Angrist and Lavy identify the effects of class size reductions. We will take into account the unit of analysis of the studies to determine to whether individuals were randomised in groups (i.e. cluster randomised trials), whether individuals may have undergone multiple interventions, whether there were multiple treatment groups and whether several studies are based on the same data source. Cluster randomised trials Cluster randomised trials included in this review will be checked for consistency in the unit of allocation and the unit of analysis, as statistical analysis errors can occur when they are different. When appropriate analytic methods have been used, we will meta-analyse effect estimates and their standard errors (Higgins & Green, 2011). In cases where study investigators have not applied appropriate analysis methods that control for clustering effects, we will estimate the intra-cluster correlation (Donner, Piaggio, & Villar, 2001) and correct standard errors. Multiple interventions groups and multiple interventions per individuals Studies with multiple intervention groups with different individuals will be included in this review. To avoid problems with dependence between effect sizes we will apply robust standard errors (Hedges, Tipton, & Johnson, 2010). However, simulation studies show that this method needs around 20-40 studies included in the data synthesis (Hedges et al., 2010). If this number cannot be reached we will use a synthetic effect size (the average) in order to avoid dependence between effect sizes. This method provides an unbiased estimate of the mean effect size parameter but overestimates the sta

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call