tistical models that allow for more complex relationships than can be inferred using only cross-sectional data. Panel, i.e., longitudinal, data, in which the same units are observed repeatedly at different points in time, can often provide the richer data needed for such models (e.g., Chamberlain (1984), Hsiao (1986), Baltagi (1995), Arellano and Honore (forthcoming)). Missing data problems, however, can be more severe in panels, because even those units that respond in initial waves of the panel may drop out of the sample in subsequent waves (e.g., Hausman and Wise (1979), Robins and West (1986), Ridder (1990), Verbeek and Nijman (1992), Abowd, Crepon, Kramarz, and Trognon (1995), Fitzgerald, Gottschalk, and Moffitt (1998), and Vella (1998)). Sometimes, in the hope of mitigating the effects of such attrition, panel data sets are augmented by replacing the units that have dropped out with new units randomly sampled from the original population. Following Ridder (1992), who used such replacement units to test alternative models for attrition, we call such additional samples refreshment samples. Here we explore the benefits of refreshment samples for inference in the presence of attrition. Two general approaches are often used to deal with attrition in panel data sets when refreshment samples are not available. One model, based on the missing at random assumption (MAR, Rubin (1976), Little and Rubin (1987)), allows the probability of attrition to depend on lagged but not on contemporaneous variables that have missing values. The other model (denoted by HW in the remainder of the paper, given the similarity to a model developed by Hausman and Wise (1979)), allows the probability of attrition to depend on such contemporaneous, but not on lagged, variables. Both sets of models have some theoretical plausibility, but they rely on fundamentally different restrictions on the
Read full abstract