Re-identification of home addresses from spatial locations anonymized by Gaussian skew

Christopher A Cassa,Kenneth D Mandl,Shannon C Wieland

doi:10.1186/1476-072x-7-45

Abstract

BackgroundKnowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location.ResultsWe produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km.ConclusionWe demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.

Highlights

Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology
We explore whether de-identification algorithms that use spatial blurring – a non-deterministic process – may be susceptible to weakening when an adversary can access multiple anonymized versions of the same original data set [10]
After each point was inferred using the average of fifty Gaussian skew anonymization passes, the mean distance from the average of all of the anonymized points to the original point in the data set was reduced to 0.1 km

Summary

Introduction

Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to reidentify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. Anonymization of patient address data by reassignment of geographic coordinates allows privacy preservation while sharing data for disease surveillance or biomedical research [5]. Geographical information is identifying; we have demonstrated that it is possible to correctly identify most home addresses even from low resolution point-maps commonly published in journal articles [9]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Health Geographics	Publication Date: Jan 1, 2008
Citations: 47	License type: cc-by

R Discovery Prime

R Discovery Prime

Re-identification of home addresses from spatial locations anonymized by Gaussian skew

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Health Geographics

Lead the way for us

Similar Papers

Logistics Sprawl
Laetitia Dablanc ... Scott Ogilvie
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2410
Laetitia Dablanc, et. al.Laetitia Dablanc ... Scott Ogilvie
01 Jan 2014
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2410

A Comparison of Two Sensory Panels Trained with Different Feedback Calibration Range Specifications via Sensory Description of Five Beers.
Line Elgaard ... Derek V Byrne
Foods | VOL. 8
Line Elgaard, et. al.Line Elgaard ... Derek V Byrne
01 Nov 2019
Foods | VOL. 8

Effects of shot peening on microstructure evolution and mechanical properties of surface nanocrystal layer on titanium matrix composite
Yan Wen ... Weijie Lu
Materials & Design | VOL. 206
Yan Wen, et. al.Yan Wen ... Weijie Lu
27 Apr 2021
Materials & Design | VOL. 206

Validation study of 3D-printed anatomical models using 2 PLA printers for preoperative planning in trauma surgery, a human cadaver study
Lars Brouwers ... Mike Bemelman
European Journal of Trauma and Emergency Surgery | VOL. 45
Lars Brouwers, et. al.Lars Brouwers ... Mike Bemelman
11 Jun 2018
European Journal of Trauma and Emergency Surgery | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Re-identification of home addresses from spatial locations anonymized by Gaussian skew

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Health Geographics