Increasingly, social work researchers are performing secondary analyses using data from large-scale surveys such as the National Health Interview Survey (NHIS) (National Center for Health Statistics, 1997), the National Education Longitudinal Study of 1988 (NELS:88), and the Longitudinal Study of Aging (LSOA). Generally, these surveys use complex probability sampling procedures such as stratification, multiple stages of selection, and oversampling to obtain a representative sample of the target population (Cochran, 1977; La Vange, Stearns, Lafata, Koch, & Shah, 1996). Because of these sampling procedures, analyzing these data with traditional statistical software, such as SPSS or SAS, which use ordinary and generalized least squares estimation techniques, results in the underestimation of the standard error (Silbersiepe & Hardy, 1997), inappropriate confidence intervals, and misleading tests of significance (Carlson, Johnson, & Cohen, 1993). Also, these statistical software packages assume a simple random sampling (Silbersiepe & Hardy) and do not control for features of the sampling design that affect observed outcomes (La Vange et al.). Given these aspects of traditional statistical software packages, they are considered inappropriate for analyzing complex survey data (Brogan, Daniels, Rolka, Marsteller, & Chattopadhay, 1998; La Vange et al.). Analyzing complex survey data requires specialized software. SUDAAN (Shah, Barnwell, & Bider, 1997) and PC Carp (Fuller, Kennedy, Schnell, Sullivan, & Park, 1986) are the most widely used packages (La Vange et al., 1996) Other software used are WesVarPC (Brick, Broene, James, & Severynse, 1996) and Stata (StataCorp, 1996). Many social work researchers, however, are not using these specialized programs to analyze complex survey data. One reason may be that social work researchers may be unaware of the need to use such programs. This article uses data from the 1994 AIDS Knowledge and Attitudes Supplement to the National Health Interview Survey (NHIS) to illustrate that biased point estimates, inappropriate standard errors, and misleading tests of significance can result from using traditional software packages for complex survey analysis. We also illustrate how results are affected by assuming a simple random sample (with and without the option of weighting), by using the sampling weights without controlling for complex survey design effect, by and controlling for complex survey design effects and using national estimate weights. METHODOLOGICAL ISSUES When analyzing data from complex surveys, researchers need to account for the sampling design use the sampling weights and the appropriate statistical software. Accounting for the Sampling Design Because the complex probability procedures used in large-scale surveys to obtain a representative sample of the target population do not result in a simple random sample, the consequences of these procedures on statistical inferences must be taken into account (Johnson, & Elliott, 1998). To control for design effects, researchers need to alter the variance estimates by considering issues such as stratification, clustering, other proponents of nonindependence of observations, and oversampling or undersampling of subgroups of the population. The simple definition of a design effect is the ratio of the estimated population variance taking the complex design features into account over the estimated population variance based on the same sample and assuming a simple random sample. A design effect of one indicates that the variance estimate is the same regardless of the controls for the complex survey design. When the design effect is greater than one, the standard error (SE) calculated controlling for the complex survey design is larger than the SE calculated assuming a simple random sample. Therefore, the confidence interval around the point estimate is larger. …
Read full abstract