Abstract

Inl statistical inference proper, the model is never questioned. . . . The methods of mathematical statistics do not provide us with a means of specifying the model. 1 Few would question the profundity of the above quotes. Yet in practice the methodological dictum contained therein is invariably violated. In practice we ordinarily face data with an imperfectly specified model and learn of our imperfection from the data themselves. For an example, a second round estimation procedure to clean up residual serial correlation is often employed after a Durbin-WVatson statistic turns out to be insufficiently close to two. For another, we often read of a choice of algebraic form based on computation of R 2s. And few of us can deny deciding whether to drop a variable based on the outcome of a t-test. Such procedures seem sensible from an intuitive point of view -data may not speak for themselves but they can raise a voice hard for a sensible man to ignore when a model is badly misspecified. The opposite extreme, complete empiricism, is just as unattractive perhaps more so. Theory is worth something. Meaningful research does not come from exhaustively searching through lists of potential explanatory variables. Presumably it is safe to ignore sunspot data and the length of women's skirts in studying the demand for food. The reader of a research report is unconvinced by even nice results if he is made aware that a large number of trials with alternative specifications led sequentially to those results. Most would agree to the broad statements made so far. The purpose of this paper is to define more rigorously the balance between two untenable methodological attitudes; that, on the one hand, data should not be allowed to interfere with specification and, on the other, specification must be achieved mainly by experimentation. The vehicle used in the sequel to investigate costs and returns of sequential investigation is a lFnear regression model relating a regress and to two potential regressors. The assumption is that the researcher's interest lies mainly in the effect of the first regressor and the question is whether to include or exclude the second independent variable in regression. Thus, the specific problem considered here is so limited that it arises seldom in practice as a specific case. But the simple problem contains within it features that are common with those of more realistic problems of model construction. The same considerations arise in more complex situations. The results that follow are rigorous but the mode of presentation is nonrigorous. Those interested in proofs of results are referred to appropriate sources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call