Schools are often faced with two choices of assessments: large-scale standardized or small-scale classroom. Classroom assessments are the ones that teachers create--quizzes, tests, or assignments--to investigate and document student progress in the classroom. Large-scale summative assessments are the ones that policy makers (legislators, school board members, and district administrators) mandate for the purposes of accountability. Most often, these are norm- or criterion-referenced tests used to evaluate student achievement or program effectiveness across schools and districts. It is a common assumption that large-scale assessments produced by testing companies are more trustworthy than classroom assessments. However, despite substantial progress in test design, standard administrations of norm- or criterion-referenced tests given at the end of a year or unit provide little guidance for classroom instruction. So, in order to make policy decisions, states, districts, and schools are offered one of two unsatisfactory choices: Use standardized assessments that are trusted by the public but pretty useless for making decisions about individual student progress, or use classroom assessments for which the teacher has little formal evidence about or reliability yet are informative for making instructional decisions. Even though classroom and large-scale assessments need not be mutually exclusive tools for monitoring learning, it is extremely rare to find valid and reliable tests that can be linked directly to classroom practices and instructional activities. The BEAR Essentials At the University of California, Berkeley, we're helping to change that. Over the last 15 years, our research team at the Berkeley Evaluation and Assessment Research Center (we call it the BEAR Center) in the Graduate School of Education has been developing assessment systems that are both psychometrically sound and instructionally relevant. Because of these two attributes, BEAR-designed assessments can be used within and across classrooms and even across schools. To say that they're instructionally relevant means they're embedded in the curriculum for which they've been designed to monitor student progress. This means assessments of student progress and performance are integrated into instructional materials and are virtually indistinguishable from day-to-day classroom activities. This offers the potential for great ecological validity because they're grounded in everyday teacher practices. We can assess a wider range of skills than institutional assessments. Furthermore, we avoid one-shot testing situations and focus instead on the process of learning and an individual's progress. The BEAR Assessment System grew out of two early assessment projects our team worked on: 1) California's voluntary Golden State Exam, which debuted in 1985 and which met its demise in 2003 with implementation of the No Child Left Behind legislation, and SEPUP (Science Education for Public Understanding Program), an innovative science curriculum for grades 6-12 developed at UC Berkeley's Lawrence Hall of Science in 1987. With my Graduate School of Education colleague, Kathryn Sloane, we reviewed the projects and their curricula to identify their core principles. Eventually, we formally described them in a journal article, From Principles to Practice: An Embedded Assessment System, published in 2000 (also see Wilson 2005). The Assessment Triangle The National Research Council (2001) suggests that good assessment needs to address the three inextricably linked parts of this triangle: To address these components, the BEAR system employs four principles similar to those outlined by the National Research Council: 1) a developmental perspective on learning; 2) a tight link between instruction and assessment; 3) management by instructors to allow appropriate feedback, feed forward, and following up; and 4) the generation of quality evidence to make inferences (NRC 2001; Wilson 2005). …