Interrater Reliability of a Performance Criterion for a Very Homogeneous Group of Managers

Warren S Blumenfeld,Sidney Q Janus

doi:10.2466/pr0.1974.35.3.1076

Abstract

After the decision has been made to conduct a validation study, the development of a criterion is the single most important consideration in the conduct of the srudy. This is because inherent in the definition of validity is the demonstration of a non-zero relationship between the selection procedure being evaluated and that criterion. If that criterion is trivial and/or fallible, the quality of the evidence for validity cannot be better. The most certain way to discredit a validation study is to discredit the criterion. The criterion sets the upper limit for the quality of the study, the evidence. The four criterion characteristics to which attention ought be paid, in order of importance, are relevance, reliability, freedom from bias, and acceptability to management (cf. 1, 2, 3, 4, 5). The purpose of this study was to examine the second-most important characteristic of any criterion, i.e., reliability. The purpose was to examine the (interrater) reliability of a performance criterion developed for a validation study in an extremely homogeneous group of managers. Ratees were 65 branch managers employed by a service organization in five geographic regions (arbitrarily defined by the organization). They had been in the organizadon for a minimum of 15 yr. For each of the five subsets of managers (16, 16, 15, 14, and 4, respectively), there were four raters; three were constant (Prcsident, Senior Vice President, and Personnel Manager), fourth was the cognizant regional Vice President. The rating technique was pair comparison, The stimulus direction was, each pair of names below, circle the name of that one branch manager who in your judgment has the grem ove7-all promotion potentLzl. Interrater reliability of each of the five (4 X k) radngconfigurations was obtained by analysis of variance (6, pp. 126-127). The five interrater reliability coefficients were .90, -93, .81, 93, and .95, respectively (p < .01). Even out of context, these values must be considered satisfactory, i.e., range, .81 to .95, median, .93. In context, however, they are even more surprising and satisfactory. As the validation project developed, and as expected, individual differences among the managers were extremely slight. That is, the homogeneity of the group resulted in extremely small dispersion measures on almost all experimental predictors subsequently investigated. The primary explanation for this is presumed to be self and institutional selection which had produced individuals who were very much alike. This extreme homogeneity being the case, the high interrater reliabilities of the perceptions are all the more striking. It appears then that pair comparison can be recommended when a rating criterion is to be used, and it is known in advance that group homogeneity might lead to the absence of systematic perception of individual differences in performance.

Full Text