Performance Evaluations as a Measure of Teacher Effectiveness When Implementation Differs: Accounting for Variation across Classrooms, Schools, and Districts

James Cowan,Dan Goldhaber,Roddy Theobald

doi:10.1080/19345747.2021.2018747

Abstract

We use statewide data from Massachusetts to investigate teacher performance evaluations as a measure of teaching effectiveness. Schools tend to classify most of their teachers as proficient, but we document substantial variation across schools in the extent to which ratings differentiate teachers. Using event study and teacher fixed effects designs, we verify that these patterns are driven by differences in the application of standards rather than differences in the distribution of teacher quality. When we evaluate teachers’ movement from schools with greater to lower differentiation in their evaluations using an event study design, we find that the probability of receiving the highest performance rating drops by about 5 percentage points and, at least in the first year, the probability of receiving one of the lowest two ratings drops by 5 percentage points. As a result, even after regression adjustment, teacher evaluation ratings generally provide unreliable predictions of future teacher evaluations after teachers switch schools. These findings suggest that policymakers and researchers should use caution in using performance evaluation ratings to make comparisons between teachers in different contexts.

Full Text