Abstract
BackgroundHealth surveys provide a rich array of information but on relatively small numbers of individuals and evidence suggests that they are becoming less representative as response levels fall. Routinely collected administrative data offer more extensive population coverage but typically comprise fewer health topics. We explore whether data combination and multiple imputation of health variables from survey data is a simple and robust way of generating these variables in the general population.MethodsWe use the UK Integrated Household Survey and the English 2011 population census both of which included self-rated general health. Setting aside the census self-rated health data we multiply imputed self-rated health responses for the census using the survey data and compared these with the actual census results in 576 unique groups defined by age, sex, housing tenure and geographic region.ResultsCompared with original census data across the groups, multiply imputed proportions of bad or very bad self-rated health were not a markedly better fit than those simply derived from the survey proportions.ConclusionWhile multiple imputation may have the potential to augment population data with information from surveys, further testing and refinement is required.
Highlights
Health surveys provide a rich array of information but on relatively small numbers of individuals and evidence suggests that they are becoming less representative as response levels fall
Distributions were broadly similar in the two datasets with survey respondents slightly older, more educated, and more likely to be female, own their home, and be married than those from the census
Survey respondents were less positive about their health, with 78% rating it as good or very good compared with 83% of the census
Summary
Health surveys provide a rich array of information but on relatively small numbers of individuals and evidence suggests that they are becoming less representative as response levels fall. More elaborate bespoke techniques have been used to some effect to obtain superior population and sub-population level estimates from surveys [3,4,5], for example using multilevel models to estimate variables of interest in terms of respondents’ characteristics and those of the area in which respondents live, with results from these models weighted by the frequency of the modelled characteristics in the target population [4, 6] While this type of approach offers an improvement on traditional weighting, including more variables common to the survey and target population, their use is still limited, in particular producing group level estimates rather than the individual data required for further statistical analysis
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have