Using a large database, this study examined three refinements of validity generalization procedures: (a) a more accurate procedure for correcting the residual SD for range restriction to estimate SDP, (b) use of f instead of study-observed rs in the formula for sampling error variance, and (c) removal of non-Pearson rs. The first procedure does not affect the amount of variance accounted for by artifacts. The addition of the second and third procedures increased the mean percentage of validity variance accounted for by artifacts from 70% to 82%, a 17% increase. The cumulative addition of all three procedures decreased the mean SDf estimate from .150 to .106, a 29% decrease. Six additional variance-producing artifacts were identified that could not be corrected for. In light of these, we concluded that the obtained estimates of mean SDP and mean validity variance accounted for were consistent with the hypothesis that the true mean SDP value is close to zero. These findings provide further evidence against the situational specificity hypothesis. The first published validity generalization research study (Schmidt & Hunter, 1977) hypothesized that if all sources of artifactual variance in cognitive test validities could be controlled methodologically through study design (e.g., construct validity of tests and criterion measures, computational errors) or corrected for (e.g., sampling error, measurement error), there might be no remaining variance in validities across settings. That is, not only would validity be generalizable based on 90% credibility values in the estimated true validity distributions, but all observed variance in validities would be shown to be artifactual and the situational specificity hypothesis would be shown to be false even in its limited form. However, subsequent validity generalization research (e.g., Pearlman, Schmidt, & Hunter, 1980; Schmidt, Gast-Rosenberg, & Hunter, 1980; Schmidt, Hunter, Pearlman, & Shane, 1979) was based on data drawn from the general published and unpublished research literature, and therefore it was not possible to control or correct for the sources of artifactual variance that can generally be controlled for only through study design and execution (e.g., computational and typographical errors, study differences in criterion contamination). Not unexpectedly, many of these meta-analyses accounted for less than 100% of observed validity variance, and the average across studies was also less than 100% (e.g., see Pearlman et al., 1980; Schmidt et al., 1979). The conclusion that the validity of cognitive abilities tests in employment is generalizable is now widely accepted (e.g., see