Incorporating Economic Conditions in Synthetic Microdata for Business Programs

Katherine Thompson,Hang Joon Kim

doi:10.1093/jssam/smab054

Abstract

Abstract Many agencies are currently investigating whether releasing synthetic microdata could be a viable dissemination strategy for highly sensitive data, such as business data, for which disclosure avoidance regulations would otherwise prohibit the release of public use microdata. The U.S. Census Bureau has identified the Economic Census as a candidate program and has been developing synthetic data generators. The synthetic data should account for skewed and irregular distributions, satisfy predetermined edit constraints, and preserve selected privacy features. Previous research on these generators was confined to businesses that were in operation for the full year, ignoring the special features of births and deaths in the models. These generators preserve multivariate relationships and yield marginal totals that closely correspond to the published official statistics. However, these synthetic data consequently do not reflect the state of economic expansion or contraction. This missing information is a severe deficiency for the targeted data users comprising economists, policymakers, and methodologists, especially since the global pandemic of 2020. This paper introduces an approach that addresses this deficiency, producing partially synthetic data with high utility and privacy protection. We provide preliminary results using selected industry data from the 2012 Economic Census.

Full Text