Abstract Bilingual raters play an important role in assessing spoken-language interpreting (between X and Y languages). Presumably, raters with X being the dominant language (DL) and Y the less DL can potentially differ, in terms of rating processes, from other raters with Y being the DL and X the less DL, when assessing either X-to-Y or Y-to-X interpreting. As such, raters’ language background and its interaction with interpreting directionality may influence assessment outcomes. However, this complex interaction and its effects on assessment have not been investigated. We therefore conducted the current experiment to explore how raters’ language background and interpreting directionality would affect assessment of English-Chinese, two-way interpreting. Our analyses of the quantitative data indicate that, when assessing interpreting into raters’ mother tongue or DL, they displayed a greater level of self-confidence and self-consistency, but rated performance more harshly. Such statistically significant group-level disparities led to different assessment outcomes, as pass and fail rates varied, depending on the rater group. These quantitative findings, coupled with the raters’ qualitative comments, may have implications for selection and training of bilingual raters for interpreting assessment.