Robert Luskin's article in this issue provides a useful service by appropriately qualifying several points I made in my 1986 American Journal of Political Science article. Whereas I focused on how to avoid common mistakes in quantitative political science, Luskin clarifies ways to extract some useful information from usually problematic statistics: correlation coefficients, standardized coefficients, and especially RZ. Since these three statistics are very closely related (and indeed deterministic functions of one another in some cases), I focus in this discussion primarily on R2, the most widely used and abused. Luskin also widens the discussion to various kinds of specification tests, a general issue I also address. In fact, as Beck (1991) reports, a large number of formal specification tests are just functions of R2, with differences among them primarily due to how much each statistic penalizes one for including extra parameters and fewer observations. Reasons for Concern about Model Specification Quantitative political scientists often worry about model selection and specification, asking questions about parameter identification, autocorrelated or heteroscedastic disturbances, parameter constancy, variable choice, measurement error, endogeneity, functional forms, stochastic assumptions, and selection bias, among numerous others. These model specification questions are all important, but we may have forgotten why we pose them. Political scientists commonly give three reasons: (I) finding the true model, or the full explanation; (2) prediction; and (3) estimating specific causal effects. I argue here that (1) is used the most but useful the least; (2) is very useful but not usually in political science where forecasting is not often a central concern; and (3) correctly represents the goals of political scientists and should form the basis of most of our quantitative empirical work.
Read full abstract