Analysis of Covariance with Qualitative Data

Gary Chamberlain

doi:10.2307/2297110

Abstract

This paper deals with data that has a group structure. A simple example in the context of a linear regression model is E(yitlx, 1S, ar) = P'xit + ai (i = 1, ...,9 N; t = 1, ... T), where there are T observations within each of N groups. The ai are group specific parameters. Our primary concern is with the estimation of f3, a parameter vector common to all groups. The role of the ai is to control for group specific effects; i.e. for omitted variables that are constant within a group. The regression function that does not condition on the group will not in general identify 1: E(yitlx, 13) 0 1'xit. In this case there is an omitted variable bias. An important application is generated by longitudinal or panel data, in which there are two or more observations on each individual. Then the group is the individual, and the ai capture individual differences. If these person effects are correlated with x, then a regression function that fails to control for them will not identify f. In another important application the group is a family, with observations on two or more siblings within the family. Then the ai capture omitted variables that are family specific, and they give a concrete representation to family background. We shall assume that observations from different groups are independent. Then the ai are incidental parameters (Neyman and Scott (1948)), and 0, which is common to the independent sampling units, is a vector of structural parameters. In the application to sibling data, T is small, typically T= 2, whereas there may be a large number of families. Small T and large N are also characteristic of many of the currently available longitudinal data sets. So a basic statistical issue is to develop an estimator for j that has good properties in this case. In particular, the estimator ought to be consistent as N -> ac for fixed T. It is well-known that analysis of covariance in the linear regression model does have this consistency property. The problem of finding consistent estimators in other models is non-trivial, however, since the number of incidental parameters is increasing with sample size. We shall work with the following probability model: Yit is a binary variable with

Full Text