Rater-Mediated Assessments Research Articles

In any performance-based musical assessment context, construct-irrelevant variability attributed to raters is a cause of concern when constructing a validity argument. Therefore, evidence of rater quality is a necessary criterion for psychometrically sound (i.e., valid, reliable, and fair) rater-mediated music performance assessments. Rater accuracy is a type of rater quality index that measures the distance between raters’ operational ratings and an expert’s criterion ratings on a set of benchmark, exemplar, or anchor musical performances. The purpose of this study was to examine the quality of ratings in the context of a secondary-level solo music performance assessment using a Multifaceted Rasch Rater Accuracy (MFR-RA) measurement model. This study was guided by the following research questions: (a) overall, how accurate were the rater judgments in the assessment context? (b) how accurate were the rater judgments across each of the items of the rubric?, and (c) how accurate were the rater judgments across each of the domains of the rubric? Results indicated that accuracy scores generally matched the expectations of the MFR-RA model, with rater locations higher than the average student performance, item, and domain locations, indicating that the student performances, items, and domains were relatively easy to rate accurately for the sample of raters examined in this study. Overall, rater accuracy ranged from 0.54 logits ( SE = 0.05) for the most accurate rater to 0.24 logits ( SE = 0.04) for the least accurate rater. Difficulty of rater accuracy across items indicated a range of 0.91 logits ( SE = 0.08) to -1.83 logits ( SE = 0.17). Difficulty of rater accuracy across domains ranged from 0.25 logits ( SE = 0.08) to -0.68 logits ( SE = 0.17). Implications for the improvement of music performance assessments with specific regard to rater training are discussed.

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on large-scale rater-mediated language assessments. Results from the review of 259 methodological and applied studies reveal an emphasis on inter-rater reliability as evidence of rating quality that persists across methodological and applied studies, studies primarily focused on rating quality and studies not primarily focused on rating quality, and across multiple language constructs. Additional findings suggest discrepancies in rating designs used in empirical research and practical concerns in performance assessment systems. Taken together, the findings from this study highlight the reliance upon aggregate-level information that is not specific to individual raters or specific facets of an assessment context as evidence of rating quality in rater-mediated assessments. In order to inform the interpretation and use of ratings, as well as the improvement of rater-mediated assessment systems, rating quality indices are needed that go beyond group-level indicators of inter-rater reliability, and provide diagnostic evidence of rating quality specific to individual raters, students, and other facets of the assessment system. These indicators are available based on modern measurement techniques, such as Rasch measurement theory and other item response theory approaches. Implications are discussed as they relate to validity, reliability/precision, and fairness for rater-mediated assessments.

Rater-Mediated Assessments Research Articles

Related Topics

Articles published on Rater-Mediated Assessments

Latent trait modelling of rater accuracy in formative peer assessment of English-Chinese consecutive interpreting

Evaluación de calidad de calificaciones en exámenes escritos a través del modelo multifocal de lente y la teoría de medición de Rasch

Investigating rater accuracy in the context of secondary-level solo instrumental music performance

ライティング評価の一致はなぜ難しいか : 人間の介在するアセスメント( 言語・コミュニケーションの学習・教育と社会言語科学-人間・文化・社会をキーワードとして-)

The Stabilizing Influences of Linking Set Size and Model-Data Fit in Sparse Rater-Mediated Assessment Networks.

A systematic review of methods for evaluating rating quality in language assessment

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model.

Adjacent-Categories Mokken Models for Rater-Mediated Assessments.

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Examining Rater Precision in Music Performance Assessment

Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis.

Approximate measurement invariance in cross-classified rater-mediated assessments.

Book Review: Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments

A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters

The Inter-rater Reliability in Scoring Composition

Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Rater-Mediated Assessments Research Articles

Related Topics

Articles published on Rater-Mediated Assessments

Latent trait modelling of rater accuracy in formative peer assessment of English-Chinese consecutive interpreting

Evaluación de calidad de calificaciones en exámenes escritos a través del modelo multifocal de lente y la teoría de medición de Rasch

Investigating rater accuracy in the context of secondary-level solo instrumental music performance

ライティング評価の一致はなぜ難しいか : 人間の介在するアセスメント( 言語・コミュニケーションの学習・教育と社会言語科学-人間・文化・社会をキーワードとして-)

The Stabilizing Influences of Linking Set Size and Model-Data Fit in Sparse Rater-Mediated Assessment Networks.

A systematic review of methods for evaluating rating quality in language assessment

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model.

Adjacent-Categories Mokken Models for Rater-Mediated Assessments.

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Examining Rater Precision in Music Performance Assessment

Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis.

Approximate measurement invariance in cross-classified rater-mediated assessments.

Book Review: Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments

A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters

The Inter-rater Reliability in Scoring Composition

Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis