Abstract

Increasing interest has been expressed, particularly by social scientists, in the analysis of unbalanced data occurring in models not of full rank, i.e., linear models where the numbers of observations in the subclasses are unequal. The various discussions of this problem that have appeared in the psychological and educational literature differ depending upon the intent of the author(s). For example, some writers (Gourlay, 1955; Steinhorst & Miller, 1969; Tsao, 1942, 1946; Williams, 1972) have examined and at times compared different types of solutions, including approximate solutions, for the unbalanced case. Other writers (Overall & Spiegel, 1969; Rawlings, 1972) have focused upon the difficulty which arises in interpreting the results of unbalanced data analyses because the estimable functions involved in tests of hypotheses are not orthogonal. Though not specifically addressed to the unbalanced case but important in this regard is the work of Bottenberg and Ward (1960), Cohen (1968), Hurst (1970), and Jennings (1967) that demonstrates the equivalence of linear regression and the fixed-effects analysis of variance by the use of regression on dummy variables. This diversity in purpose, combined with the relative narrowness of the individual efforts, has resulted in a fragmented treatment of the problem of unbalanced data and in some cases confusion and controversy regarding methodology, e.g., the role of constraints in obtaining a solution for models not of full rank or the calculation of reductions in sums of squares (Overall & Spiegel, 1973; Rawlings, 1973; Smith, 1973). In response, at least one writer (Joe, 1971) has been prompted to call for a greater understanding and more detailed presentation of the models employed. Some of the more important issues encountered in the analysis of unbalanced data will be examined here within a single framework. These issues are broadly defined as (1) solutions to the normal equations for linear models where the incidence or design matrix is not of full column rank, (2) estimable functions of the parameters and tests of hypotheses, and (3) reductions in sums of squares and interpretation of results. Because of the increased availability of computer hardware and sophisticated least squares routines, the various approximate solutions for analyzing unbalanced data are not considered here.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.