Nathan R. Kuncel, John P. Campbell, and Deniz S. Ones University of Minnesota, Twin Cities Campus Although Stemberg and Williams (June 1997) addressed an important topic, they pur- posely did not use what are now widely accepted procedures for estimating predic- tive validities. Their failure to use appropri- ate parameter estimation techniques pro- duced inaccurate results and misleading con- clusions. Their key errors are as follows. Failure to Take Range Restriction Into Account The objective of Sternberg and Williams's (1997) study was to estimate the criterion- related validity of the Graduate Record Ex- amination (GRE) for predicting subsequent graduate school performance for the appli- cant population. It is axiomatic in this con- text that estimation should be as efficient and unbiased as possible. Also, because Sternberg and Williams argued that their re- sults are likely to generalize to all psychol- ogy programs, we infer that the population under consideration is composed of all ap- plicants to psychology PhD programs. When a sample is selected, either di- rectly or indirectly, on the basis of the pre- dictor variable (GRE scores), and there is a nonzero correlation between the GRE and performance in the population, range re- striction will occur and will attenuate the correlation between the two variables in the sample (Thorndike, 1949). Sternberg and Williams (1997) dismissed the need for range restriction corrections by pointing to the positive correlations between GRE scores and several of their criterion variables. This argument reflects an apparent misunderstand- ing of the reason for making such correc- tions, which is to obtain a less biased esti- mate of the population parameter. The mere presence of variability in a sample and a moderately large value for the correlation are not sufficient conditions for dismissing the biasing effects of range restriction. The accu- racy of the predictor in the population could be much greater than the moderately large sample value would imply. It is the ratio of the sample variance to the population vari- ance that is the indicator of range restriction. Given linearity and homoscedasticity across the full range of scores in the population, the smaller the ratio, the more the sample corre- lation will underestimate the population pa- rameter. It is the responsibility of investiga- tors to make the most accurate parameter estimates that their data will allow. Sternberg and Williams (1997) reported standard deviations of GRE subscales for their sample; however, the situation is com- plicated by what seems to be an inappropri- ate treatment of outliers. For example, the sample included some number of GRE scores that could have been artifactually low be- cause of international student applicants who were nonnative English speakers. Including such scores would inflate the standard de- viation of all scales (but most severely, the GRE Verbal subscale). In general, when specific conditions can produce invalid scores (nonnative speaker, illness, or severe test anxiety), these cases should be omitted from the analyses. Includ- ing them has two effects. First, it introduces additional sources of measurement error, re- sulting in attenuated correlations. Second, it artifactually inflates the sample standard de- viation, leading to an underrepresentation of range restriction and a biased estimate of the population value. Failure to Take Criterion Unreliability Into Account Both the predictor and the criterion are im- perfect measures of their respective con- structs. However, for applied purposes, the population parameter of greatest interest is the size of the correlation between the im- perfectly reliable test (which is what must be used to make selection decisions) and the true score on the latent variable--graduate school performance. Criterion unreliability attenuates the correlation between the ob- servable predictor scores and the true scores on performance. However, higher reliability for the criterion could be obtained if more resources were devoted to its measurement (e.g., making the criterion measure much longer or using many more raters). Validity estimates should not be biased by whatever degree of measurement error is reflected in a specific criterion. Consequently, appropri- ate corrections for criterion unreliability are necessary. However, Sternberg and Williams (1997) argued that obtaining substantial correlations indicates that unreliability is not an issue. This argument again strays from the principal objective, which is to estimate the population parameter as accurately possible. Although less than perfectly reli- able criterion measures can yield nonzero estimates of the validity coefficient, the ob- served correlations will underestimate the true validity of the predictor. To go a step further and argue that the observed (and attenuated) correlations are too small and therefore the predictor (GRE) is neither valid nor very useful is inappropriate. Sternberg and Williams made exactly this kind of argument. Reanalysis On the basis of both previous literature and a reanalysis of Sternberg Williams's (1997) data, we would argue that the opera- tional validity of the GRE is higher than what Sternberg and Williams claimed. Both the GRE Technical Manual (Briel, O'Neill, & Scheuneman, 1993) and the meta-analysis by Goldberg and Alliger (1992) provide ob- served correlations between the GRE and graduate student performance. Although both of sources also failed to use ap- propriate corrections, they provide consid- erably more information about the observed sample relationships between GRE scores and graduate student performance than can be obtained from a single small-sample study. The GRE Technical Manual (Briel et al., 1993) reports median criterion-related validity coefficients for the GRE Verbal subscale (r = .18), the GRE Quantitative subscale (r =. 19), and the GRE Advanced Psychology Test (r = .32) for psychology graduate school success across a number of studies and criterion measures. In a small- scale meta-analysis, Goldberg and Alliger (1992) reported similar results, with an av- erage observed correlation of. 18. Sternberg and Williams were kind enough to provide us with a copy of their data for reanalysis. Unfortunately, we were unable to identify the students for whom English was not their first language, which prevented the computation of appropriate corrections for range restriction. Alterna- tively, we corrected a subset of the validity estimates published in Sternberg and Williams's (1997) article by using other in- formation. These reestimates are shown in Table 1. The observed correlations between the GRE Verbal, Quantitative, Analytical, and Psychology subscale scores with the ratings of research performance were. 12, .07,. 12, and .14, respectively. Next, we obtained GRE information on the PhD program ap- plicants selected by the University of Minnesota's Department of Psychology for 1997-1998 and eliminated the nonnative English speakers. Standard deviations were then computed for the Verbal and Quantita- tive scores by using the University of Min- nesota data. The estimates were 75.1 and 66.6, respectively, as compared with the Educational Testing Service's (1996) com- puted applicant pool standard deviations of May 1998 • American Psychologist 567

Full Text

Published Version
Open DOI Link

Get access to 115M+ research papers

Discover from 40M+ Open access, 2M+ Pre-prints, 9.5M Topics and 32K+ Journals.

Sign Up Now! It's FREE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call