The highly-praised and influential study, “Getting Tough?,” was published in 2001. Briefly, while controlling for a host of student, school, state, and educator background variables, the study regressed 1988 to 1992 student-level achievement score gains onto a dummy variable for the presence (or not) of a high school graduation test at the student’s school. The 1992-1988 difference in scores on the embedded cognitive test in a US Educational Department longitudinal survey comprised the gain scores. The study was praised for its methodology, controlling for multiple baseline variables which previous researchers allegedly had not, and by some opposed to high-stakes standardized testing for its finding of no achievement gains. Indeed, some characterized the work as so far superior in method it justified dismissing all previous work on the topic. Moreover, the article was timely, its appearance in print coincident with congressional consideration of the No Child Left Behind Act (2001) and its new federal mandate requiring annual testing in seven grades and three subjects in all U.S. public schools. The article also served as the foundation for a string of ensuing studies nominally showing that graduation exams bore few benefits and outsized costs (e.g., in dropout rates). Graduation exam opponents would employ these critical studies as evidence to political effect. From a high number of more than thirty states around the turn of the millennium, graduation tests are now administered in only seven or eight states. The multivariate analysis in “Getting Tough?” should have had the advantage of authenticity — an analysis of a phenomenon studied in its actual context. But that should mean that the context is understood and specified in the analysis, not ignored as if it couldn’t matter. And, it could have been understood and specified. Most of the relevant information left out of “Getting Tough?” — specific values for other factors that tend to affect test performance or student achievement — was available from the three contemporary surveys, and the rest could have been obtained from a more detailed evidence-gathering effort. The study could have been more insightful had it been done differently, perhaps with less emphasis on “more sophisticated” and “more rigorous” mathematical analysis, and more emphasis on understanding and specifying the context — how testing programs are organized, how tests are administered, the effective differences among the wide variety and forms of tests and how students respond differently to each, the legal context of testing in the late 1980s and early 1990s, and so on.
Read full abstract