Abstract

ABSTRACT Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e., that scores represent the intended construct). Past efforts to support this inference assume that rater error can be accurately estimated from a few scoring occasions. We empirically test this assumption using two datasets from the Measures of Effective Teaching project. Results show that rater error is highly complex and difficult to measure precisely from a few scoring occasions. Typically, designed rater monitoring and control mechanisms likely cannot measure rater error precisely enough to show that raters can distinguish between levels of teaching quality within the range typically observed. We discuss the implications for supporting the scoring inference, including recommended changes to rater monitoring and control mechanisms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call