Abstract

AbstractNational statistical institutes are under increasing pressure to reduce administrative costs and response burden for the production of official statistics. At the same time, data users expect more and more that this statistical information becomes available in a more timely manner, with increased frequency and at a more detailed level. This could potentially be accomplished by using large data sets that are generated as a by-product of processes not directly related to statistical production purposes, so called big data. Two different research lines are identified on how big data sources can be used in the production of official statistics. The first approach is to combine big data sources with sample data in a model-based inference approach. This implies that big data sources are used as covariates in models used for small area estimation and in time series models, where cross-sectional and temporal correlations are used to improve the precision and timeliness of sample statistics. The second approach is to use big data sources as a primary data source for the compilations of official statistics. This can be considered if a big data source covers the intended target population and does not suffer too much from under- and over-coverage, e.g., the use of satellite and areal images for deriving statistical information on land use. In most cases, however, adjustments for selection bias are required. This chapter summarizes the potentials and risks of the use of new data sources like big data in the production of official statistics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call