Abstract

Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.

Highlights

  • Secondary analyses of survey data sets collected from large probability samples of persons or establishments further scientific progress in many academic fields, including education, sociology, and public health

  • We found that the odds of describing results with respect to the larger target population were nearly six times higher in the Survey of Doctorate Recipients (SDR) when compared to the National Survey of Recent College Graduates (NSRCG) and more than five times higher when compared to the National Survey of College Graduates (NSCG), consistent with the results in Fig 2 and Table 3

  • We highlight six key findings in this study: 1. The sampled research products rarely accounted for the complex design features of the samples underlying the SESTAT survey data, and these prevalence rates did not vary across the three SESTAT surveys: only 55% of the products incorporated the publicly-available sampling weights into the analyses, only 8% of the products accounted for the complex sampling features when estimating variances, and only 11% of the products presenting design-based analyses performed appropriate subpopulation analyses accounting for the complex sampling [2]

Read more

Summary

Introduction

Secondary analyses of survey data sets collected from large probability samples of persons or establishments further scientific progress in many academic fields, including (but not limited to) education, sociology, and public health. Analytic Error in Survey Data Analysis enabling inferences about population characteristics or relationships between variables of interest in a finite population of interest, are often “complex” in nature, employing sampling strategies such as stratification of the population and cluster sampling [1,2] These complex sample design features improve the cost efficiency of survey data collection, and require secondary analysts to employ approaches that account for the effects of the complex sampling statistically [3]. The application of standard statistical methods to these data sets can lead to incorrect population inferences, which effectively negates the resources dedicated to the survey data collection This potential analytic error on the part of secondary analysts defines an important part of the widely-researched Total Survey Error (TSE) framework [4,5,6,7,8]. This important component of TSE has received almost no research attention relative to the other important sources of survey error that define this framework

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call