Abstract

SUMMARY Three methods of carrying out a regression analysis of data collected by means of a survey of complex design are investigated. Least squares methods which ignore population structure such as clustering or stratification can give seriously misleading results. Probability weighted methods are much better and give reasonable inferences for equal probability designs. However, for designs with widely differing selection probabilities the inferences can be poor. The best results were obtained for an estimator derived from maximum likelihood theory. This estimator requires that values of a design variable be known for all units in the population. Some aspects of the robustness of this procedure are studied. REGRESSION analysis is widely used in the analysis of data derived from a sample survey of complex design. McKennell (1970) describes a survey of residents around Heathrow Airport in which a stratified design with unequal sampling fractions was employed. A regression equation was fitted to the survey data which related the respondents' subjective attitudes to noise to various measures of physical exposure. A simplified version of this equation, the Noise and Number Index, now features prominently in discussions on the siting of future airports. In a subsequent survey (HMSO, 1971) a stratified cluster sample was employed and similar regression analyses were carried out. DeMets and Halperin (1977) employ regression analysis on data from a purposive sample of patients in the Framingham Heart Study. They were interested in the effect of dietary cholesterol on serum cholesterol, both measured concurrently, and based the sample on patients with the highest and lowest values of initial serum cholesterol level. This paper is concerned with the question of what advice to give clients who wish to estimate regression parameters and to calculate the variances of their estimates using data obtained from a survey of complex design. We assume that the appropriateness of the regression model is not in question and that in practice diagnostic plots and checks would be made to confirm this. We assume further that the statistician and his client agree that the appropriate model for study is a single equation fitted to all the data rather than separate equations fitted to subsets of the data. If separate models are fitted to subsets of the data in such a way that the data may be divided into mutually exclusive groups then the analysis of each group would still fall within the scope of this paper.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call