Conditional Log-Linear Models for Analyzing Categorical Panel Data

Zvi Gilula,Shelby J Haberman

doi:10.2307/2290867

Abstract

Abstract Conditional log-linear models are developed for panel data and used to predict sequences of categorical responses. The class of models considered includes conventional Markov models and independence models as well as distance models in which all previous responses and present and past values of covariates are used to predict the current response. The approach taken in this article has some advantages over the marginal modeling approach that has become popular for longitudinal studies. Quality of prediction is measured by using a logarithmic penalty function. Given a model, conditional probabilities of responses consistent with the model are selected to provide the smallest expected penalty. This minimum expected penalty provides a measure of the predictive power of a model. Models are compared through their predictive power, as measured by the proportional reduction in expected penalty. Ways of incorporating the number of parameters of the competing models are discussed. This emphasis on predictive power contrasts with the conventional emphasis on goodness-of-fit tests. In the case of random sampling, estimates are provided for optimal prediction functions consistent with the model and for measures of predictive power. Large-sample approximations are provided for assessing the accuracy of parameter estimates and of estimated measures of quality of prediction. For measures of quality of prediction, assessments are provided for the bias of estimates. To illustrate techniques, analyses are performed on data from the National Longitudinal Study of Youth on attitudes toward a military career. Because these data are available for the same subject for each of seven years, and because demographic data are available on individual subjects, these data provide a nontrivial application. Because more than 8,000 observations are available, statistical models of practical interest do not fit the data according to conventional criteria, but they still have value in predicting subject responses. Analysis of the data shows that subjects' responses are linked much more closely to their previous responses than to demographic variables. A common Markov model for subject responses is shown to be inferior to other models in terms of predictive power. Methods considered are shown to apply to cases in which censoring or nonresponse problems exist.

Full Text