Fairness in Test Scoring

Randall D Penfield

doi:10.4324/9781315774527-5

Abstract

To introduce the topic of this chapter, let us consider a test developed with the goal of measuring writing proficiency. Ultimately, the test will be used to produce test scores reflecting each examinee’s level of writing proficiency, and to do so the test must elicit examinee responses to generate evidence of writing proficiency. To this end, the test developer is faced with a decision concerning the type of items or tasks to be used to elicit examinee responses. The test developer may opt to use multiple-choice (MC) items. Although cost-effective and efficient, MC items are indirect measures of writing proficiency and may not yield inferences about an examinee’s writing skills that rise to the same level of validity as those offered by more authentic writing tasks. To overcome limitations of MC items, the test developer may choose to employ constructed-response (CR) items consisting of a series of prompts used to elicit written responses from examinees, and have human-raters score the written responses using scoring rubrics. The resulting test may be comprised entirely of CR items or a combination of CR and MC items. While the CR item format offers a much more authentic assessment context, human-rater scoring suffers from several drawbacks, including inconsistency (e.g., due to rater severity/leniency, fatigue, etc.) and a high expense due to the time and resources required to train raters and conduct the scoring process (see Zhang, 2013). The test developer can avoid the drawbacks of human-rater scoring by using CR items that are scored by a computer-automated scoring engine (referred to as automated scoring hereafter), which applies a set of predefined decision rules to assign a score to a CR item based on particular features of the examinee’s response. Automated scoring has the advantageous properties of being perfectly consistent across examinees and being highly efficient from a resource perspective, but these scores may yield biased estimates of scores assigned by human raters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Economics of the Firm	Publication Date: Sep 19, 2016
Citations: 2	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Fairness in Test Scoring

Abstract

Talk to us

Similar Papers

More From: Economics of the Firm

Lead the way for us

Similar Papers

An Empirical Investigation of the Fairness of Multiple-Choice Items Relative to Constructed-Response Items on Tests of Students’ Mastery of Course Content
Michael Joseph Wise
Journal of Education, Teaching and Social Studies | VOL. 2
Michael Joseph WiseMichael Joseph Wise
28 Sep 2020
Journal of Education, Teaching and Social Studies | VOL. 2

Complement or Contamination: A Study of the Validity of Multiple-Choice Items when Assessing Reasoning Skills in Physics
Anders Jönsson ... Fredrik Alvén
Frontiers in Education | VOL. 2
Anders Jönsson, et. al.Anders Jönsson ... Fredrik Alvén
12 Sep 2017
Frontiers in Education | VOL. 2

Comparisons among Designs for Equating Mixed‐Format Tests in Large‐Scale Assessments
Sooyeon Kim ... Michael E Walker
Journal of Educational Measurement | VOL. 47
Sooyeon Kim, et. al.Sooyeon Kim ... Michael E Walker
01 Mar 2010
Journal of Educational Measurement | VOL. 47

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.
Jiawei Xiong ... Xinhui Maggie Xiong
Applied Psychological Measurement | VOL. 47
Jiawei Xiong, et. al.Jiawei Xiong ... Xinhui Maggie Xiong
01 Sep 2023
Applied Psychological Measurement | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fairness in Test Scoring

Abstract

Talk to us

Similar Papers

More From: Economics of the Firm