Standard setting (the process of establishing minimum passing scores on high-stakes exams) is a highly evaluative and policy-driven process. It is a common belief that standard setting panels should be diverse and representative. There is concern, however, that panelists with varying characteristics may differentially influence the results of the standard-setting process. The purpose of this study is to empirically examine whether or not the judgments of standard-setting panelists are related to select personal characteristics (gender and race/ethnicity) and educational context (geographic region and socioeconomic status) for two high-stakes examinations in one southeastern state. Results suggest that personal characteristics are not systematically related to level of recommended cut scores. Educational context, however, is an influential factor.Keywords: standard setting, standardized testing, Rasch measurement theory, education policy, high stakesReview of LiteratureSince the late 1 980s, accountability and high standards are commonly used catch-phrases within the education standards-based movement. Holding states, districts, schools, principals, teachers, families, and students accountable for student achievement is a high priority for many stakeholders (Brown, 2012; Thurlow & Ysseldyke, 2001). The No Child Left Behind Act (NCLB), for example, set requirements for public schools across the nation to assess students regularly and systematically (NCLB, 2002). Due to this federal mandate, as well as time restrictions and financial limitations, many states have implemented high-stakes standardized examinations. Also, many states with extensive accountability systems have increased the minimum passing requirements for promotion and graduation. In essence, the NCLB legislation holds states more accountable for increasing standards for school-aged youth.Furthermore, the Common Core standards signify a new wave of increased accountability for educators and students. Currently, 45 states have adopted these higher-level content standards for English Language Arts and Mathematics (http://www.corestandards.org). As a result of these new standards, new assessments are being designed to accurately measure the content and skills students are expected to master. By the year 2014, students will be required to meet these new levels of performance (operationalized as cut scores). But how are performance levels such as Below Basic, Basic, Proficient, and Advanced defined? Who decides 'how good is good enough?' These questions are answered through a process called standard setting.Currently, there are numerous standard-setting studies that investigate the methods and procedures used to establish these performance levels and cut scores. Research literature indicates that the particular procedure used can substantially, and even capriciously, influence the final cut score (ACT, 1993; Cizek, 2012; Hein & Skaggs, 2008; Reckase, 2006). Literature also suggests that the characteristics of standard-setting panelists can make a difference in final cut score recommendations (Caines 2008; Hambleton & Pituniak, 2006; Kane, 2001; Livingston & Zieky, 1982; Plake, 2008).In spite of this, few empirical studies examine panelist characteristics and potential variations in their final cut scores. One seminal study, Jaeger (1982), examined differential judgments according to various panelist characteristics for the North Carolina Competency Tests in Reading and Mathematics. The standard-setting panel was demographically and racially representative of North Carolina's adult population. Jaeger found that North Carolina registered voters recommended substantially higher cut scores (15.5 items) than that of high school teachers for the state Reading test. Also, race was a significant factor in final recommended cut scores. African American panelists systematically gave lower mean cut score ratings for the Reading and Math tests (91. …
Read full abstract