Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff

Samantha Petti,Abraham Flaxman

doi:10.12688/gatesopenres.13089.2

Abstract

Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion.

Highlights

In the United States, the Decennial Census is an important part of democratic governance
The new approach allows a more precise accounting of the variation introduced by the process, it risks reducing the utility of census data—it may produce counts that are substantially less accurate than the previous disclosure avoidance system, which was based on redacting the values of table cells below a certain size and a technique called swapping, where pairs of households with similar structures but different locations had their location information exchanged in a way that required that the details of the swapping procedure be kept secret[6]
To quantify the scale of the bias introduced by optimization, for each geographic area, we constructed simple homogeneity index by counting the cells of the detailed histogram that contained a precise count of zero, and we examined the bias, defined as the mean of the differentially private (DP) count minus precise count, for these areas when stratified by homogeneity index

Summary

Introduction

In the United States, the Decennial Census is an important part of democratic governance. The confidentiality of information in the decennial census is required by law, and the 2020 US Census will use a novel approach to “disclosure avoidance” to protect respondents’ data[4] This approach builds on Differential Privacy, a mathematical definition of privacy that has been developed over the last decade and a half in the theoretical computer science and cryptography communities[5]. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Gates Open Research	Publication Date: Apr 6, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Gates Open Research

Lead the way for us

Similar Papers

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff.
Samantha Petti ... Abraham Flaxman
Gates open research | VOL. 3
Samantha Petti, et. al.Samantha Petti ... Abraham Flaxman
04 Dec 2019
Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff.
Samantha Petti ... Abraham Flaxman

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
Abraham Flaxman ... David Van Riper
Gates Open Research | VOL. 3
Abraham Flaxman, et. al.Abraham Flaxman ... David Van Riper
20 Mar 2020
Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
Abraham Flaxman ... David Van Riper

Differential Privacy in the 2020 Decennial Census and the Implications for Available Data Products
Danah Boyd
SSRN Electronic Journal | VOL. -
Danah BoydDanah Boyd
08 Jul 2019
Differential Privacy in the 2020 Decennial Census and the Implications for Available Data Products
Danah Boyd

Differential Privacy and the US Census
Cynthia Dwork
-
Cynthia DworkCynthia Dwork
25 Jun 2019
25 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Gates Open Research