Error analysis for hybrid estimates of proportions using big data1

Siu-Ming Tam,Dennis Trewin,Lyndon Ang

doi:10.3233/sji-210924

Abstract

Big data, including administrative data, is seen as a new data source for official statistics especially given the increasing difficulty of getting acceptable response rates in sample surveys. It might be used directly or perhaps with the use of models to adjust for shortcomings in the big data. Hybrid estimates using complementary survey data are another technique for overcoming these shortcomings. To make decisions on how big data might be used, we need to understand the nature of the errors in the big data source. The paper describes an Error Framework for the analysis of errors in big data and hybrid estimates. The paper also describes the circumstances under which hybrid estimates will provide more accurate estimates than big data in isolation or survey data. A case study is provided to illustrate the application of hybrid estimates in practice. A potential application of hybrid estimation is also described to address the upward biases that often exist in epidemiological modelling.

Full Text