Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference

Jae‐Kwang Kim,Siu‐Ming Tam

doi:10.1111/insr.12434

Jae‐Kwang Kim, Siu‐Ming Tam

Open Access

https://doi.org/10.1111/insr.12434

Copy DOI

Abstract

SummaryThe statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under‐coverage in the big data source to represent the population of interest and measurement errors in the variables available in the data set. By stratifying the population into a big data stratum and a missing data stratum, we can estimate the missing data stratum by using a fully responding probability sample and hence the population as a whole by using a data integration estimator. By expressing the data integration estimator as a regression estimator, we can handle measurement errors in the variables in big data and also in the probability sample. We also propose a fully nonparametric classification method for identifying the overlapping units and develop a bias‐corrected data integration estimator under misclassification errors. Finally, we develop a two‐step regression data integration estimator to deal with measurement errors in the probability sample. An advantage of the approach advocated in this paper is that we do not have to make unrealistic missing‐at‐random assumptions for the methods to work. The proposed method is applied to the real data example using 2015–2016 Australian Agricultural Census data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Statistical Review	Publication Date: Dec 1, 2020
Citations: 22	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference

Abstract

Talk to us

Similar Papers

More From: International Statistical Review

Lead the way for us

Similar Papers

The growing role of integrated and insightful big and real-time data analytics platforms
Ranganathan Indrakumari ... Palanimuthu Suresh
-
Ranganathan Indrakumari, et. al.Ranganathan Indrakumari ... Palanimuthu Suresh
21 Nov 2019
21 Nov 2019

Rigour and reproducibility in perinatal and paediatric epidemiologic research using big data.
Anna Nguyen ... Jade Benjamin‐Chung
Paediatric and perinatal epidemiology | VOL. 37
Anna Nguyen, et. al.Anna Nguyen ... Jade Benjamin‐Chung
23 Mar 2023
Paediatric and perinatal epidemiology | VOL. 37

Building information modeling and building automation systems data integration and big data analytics for building energy management
Fu Xiao ... Cheng Fan
-
Fu Xiao, et. al.Fu Xiao ... Cheng Fan
25 Mar 2022
25 Mar 2022

Ontology Opportunities and Challenges: Discussions from Semantic Data Integration Perspectives
Abrar Omar Alkhamisi ... Mostafa Saleh
-
Abrar Omar Alkhamisi, et. al.Abrar Omar Alkhamisi ... Mostafa Saleh
01 Mar 2020
01 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference

Abstract

Talk to us

Similar Papers

More From: International Statistical Review