Using analytic rating scales to assess English/Chinese bi-directional interpretation: A longitudinal Rasch analysis of scale utility and rater behavior

Chao Han

doi:10.52034/lanstts.v16i0.429

Abstract

Descriptor-based analytic rating scales have been increasingly used to assess interpretation quality. However, little empirical evidence is available to unequivocally support the effectiveness of rating scales and rater reliability. This longitudinal study thus attempts to shed insight into scale utility and rater behavior in English/Chinese interpretation performance assessment, using multifaceted Rasch measurement. Specifically, the study focuses on criterion/scale difficulty, scale effectiveness, rater severity/leniency and rater self-consistency between English/Chinese interpreting and over three time points. Research results are discussed, highlighting the utility of analytic rating scales and the variability of rater behavior in interpretation assessment. The results also have implications for developing reliable, valid, and practical instruments to assess interpretation quality.

Highlights

Interpreting quality constitutes a crucial part of maintaining professional interpreting standards (Diriker, 2015), a major pedagogical concern (Sawyer, 2004) and a timeless topic for researchers (Pöchhacker, 2001)
A similar scoring practice is adopted in the China Accreditation Test for Translators and Interpreters (CATTI) and the United States Federal Court Interpreter Certification Examination (FCICE)
Three rating scales were used to assess three aspects of interpreting: information completeness (InfoCom), that is, to what extent source-text propositional content is interpreted, fluency of delivery (FluDel), that is, to what extent disfluencies such as un/filled pauses, long silences and fillers are present in TL renditions, and TL quality (TLQual), that is, to what extent TL expressions are idiomatic and grammatically correct

Summary

Introduction

Interpreting quality constitutes a crucial part of maintaining professional interpreting standards (Diriker, 2015), a major pedagogical concern (Sawyer, 2004) and a timeless topic for researchers (Pöchhacker, 2001). Barik (1971) and Gerver (1969/2002), for instance, are among some of the earliest researchers who applied the atomistic method to evaluating spoken-language renditions. This time-honoured method prevails today in high-stakes settings such as professional certification testing. Australia’s National Accreditation Authority for Translators and Interpreters (NAATI), for example, has used error deduction/analysis rating, a specific form of the atomistic method, to assess translation for more than 30 years (Turner, Lai, & Huang, 2010). Three days before each assessment, the students were briefed about topics and themes to be interpreted and given a source-language (SL) word list for preparation (e.g., finding TL equivalents). A total of 228 recordings (i.e., 38 students × 6 tasks) were generated on each occasion

Methods

Results

Discussion

Conclusion