A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS.

Mandi Yu,Yulei He,Trivellore E Raghunathan

doi:10.1093/jssam/smac016

Abstract

Data synthesis is an effective statistical approach for reducing data disclosure risk. Generating fully synthetic data might minimize such risk, but its modeling and application can be difficult for data from large, complex surveys. This article extended the two-stage imputation to simultaneously impute item missing values and generate fully synthetic data. A new combining rule for making inferences using data generated in this manner was developed. Two semiparametric missing data imputation models were adapted to generate fully synthetic data for skewed continuous variable and sparse binary variable, respectively. The proposed approach was evaluated using simulated data and real longitudinal data from the Health and Retirement Study. The proposed approach was also compared with two existing synthesis approaches: (1) parametric regressions models as implemented in IVEware; and (2) nonparametric Classification and Regression Trees as implemented in synthpop package for R using real data. The results show that high data utility is maintained for a wide variety of descriptive and model-based statistics using the proposed strategy. The proposed strategy also performs better than existing methods for sophisticated analyses such as factor analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS.

Abstract

Talk to us

Similar Papers

More From: Journal of Survey Statistics and Methodology

Lead the way for us

Journal: Journal of Survey Statistics and Methodology	Publication Date: May 25, 2022
Citations: 2

Similar Papers

Comparative performance of regression tree and parametric classification of savannah woody cover on SPOT 6 NAOMI imagery
C Munyati
Remote Sensing Applications: Society and Environment | VOL. 13
C MunyatiC Munyati
29 Oct 2018
Remote Sensing Applications: Society and Environment | VOL. 13

Molecular abnormalities in the major psychiatric illnesses: Classification and Regression Tree (CRT) analysis of post-mortem prefrontal markers.
M B Knable ... E F Torrey
Molecular Psychiatry | VOL. 7
M B Knable, et. al.M B Knable ... E F Torrey
01 Apr 2002
Molecular Psychiatry | VOL. 7

Reduced breathing variability as a predictor of unsuccessful patient separation from mechanical ventilation*
Marc Wysocki ... Alain Mercat
Critical Care Medicine | VOL. 34
Marc Wysocki, et. al.Marc Wysocki ... Alain Mercat
01 Aug 2006
Critical Care Medicine | VOL. 34

Analysis of driver injury severity in truck-involved accidents using a non-parametric classification tree model
Li-Yen Chang ... Jui-Tseng Chien
Safety Science | VOL. 51
Li-Yen Chang, et. al.Li-Yen Chang ... Jui-Tseng Chien
21 Jul 2012
Safety Science | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS.

Abstract

Talk to us

Similar Papers

More From: Journal of Survey Statistics and Methodology