SUMMARYThe adoption of a particular personnel device is dependent upon how consistently its use will lead to positive results. A selection device, for example, should consistently pick workers with higher production records than those of workers rejected. When repeated tests of the expected relationship yield inconsistent results, the personnel device is frequently rejected without further trial. This study indicates, however, that inconsistencies of relationship between test and ratings used as measures of productivity may be due to inconsistency of the ratings rather than to any deficiency of the personnel device.In an attempt to evaluate a battery of aptitude tests for hiring purposes, several ratings were obtained from foremen to serve as yardsticks of efficiency.1 The test initially showed inconsistent relationships with the ratings. Relationships which were consistent and sufficiently high to warrant adoption of the test were obtained only after eliminating from some sets of ratings the influence of a non‐relevant factor—length of service.Although the foremen had been instructed to rank the workers on two different traits—Personality and Ability—results showed that the two ratings covered essentially the same factors. Apparently, in spite of the logical analysis in the personnel office, the foreman's ratings of personality and his ratings of ability measure much the same thing—job effectiveness. In fact, the tests designed to predict the ability aspect of effectiveness were more closely related to the Personality ratings than to the Ability ratings. This is probably because both ratings reflected what the personnel analyst would define as “ability”, but the Personality ratings were less contaminated by a spurious relationship to mere length of service. If this finding can be generalized, it may serve to account for the many job failures ascribed to personality difficulty.A further implication is that ratings on Personality may in reality measure an aspect of ability which is relatively independent of length of service, and thus is more predictable by ability tests than are Ability ratings.One measure of the effectiveness of the rating is the extent to which it can be predicted by the selection tests. Contrary to initial expectation, the Over‐all rating was not uniformly superior to the part ratings on this basis and was definitely inferior to the sum of the part ratings. Thus the foremen, in combining the various factors leading to their appraisal of over‐all effectiveness, did not ascribe the best weights; in fact their judgmental combination was not as good as simple clerical addition without statistical adjustment.The initial results showed marked inconsistencies from rater to rater in the relation between the initial ratings and the selection tests. This inconsistency, apparently reflecting adversely on the usefulness of tests, turned out to result in part from rater differences in the meanings which each attached to the basis for his rating. Thus, correction for the spurious length‐of‐service bias by use of a second technique for securing ability ratings transformed the apparent inconsistency into consistency. Similar consistent and substantial relations between tests and ratings were obtained by limiting the study to those cases where length of service was more nearly equivalent, by using the Personality rating (less contaminated with seniority), and by statistical correction for seniority.