Inter-rater reliability for scoring children's dichotic words test responses

Kairn Stetler Kelley,Benjamin Littenberg

doi:10.3109/14992027.2015.1052103

Abstract

Objective: To determine whether rater agreement is randomly distributed or varies importantly with test-taker characteristics, test words, or rater experience with the dichotic words test (DWT). Design: DWT was administered to 34 children in 1st–4th grade and responses scored by two raters. The proportion of rater agreement was calculated for each child and for each word. Correlates of inter-rater agreement were explored. Study sample: Two raters judged 6686 total responses from 34 children. Results: Overall agreement between the two raters was 0.97. Test-taker scores ranged from 35%–91% (mean = 81%). Agreement was associated with score but not with test-taker age or sex. Test words spanned the full range of difficulty (pass proportion 0.06–1.00). Rater agreement was not randomly distributed among the words. Inter-rater agreement for test words ranged from 0.82–1.00 and was associated with pass proportion (Spearman's ρ = 0.28; p < 0.0001). However, there were words at all pass proportions with perfect or near-perfect agreement. Rater agreement improved from 0.94 on the first day of data collection to 0.98 on the fifth day (p = 0.026). Conclusions: Inter-rater reliability should be considered along with test item difficulty when developing speech audiometry materials, scoring protocols, and rater training.

Full Text