Abstract

BackgroundA linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail (e.g., geocodes such as ZIP or postal codes, census blocks, and locations on maps) while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small. The LP model determines the transition probability from an original location of a patient to a new randomized location. However, it has a limitation for the cases of areas with a small population (e.g., median of 10 people in a ZIP code).MethodsWe extend the previous LP model to accommodate the cases of a smaller population in some locations, while creating de-identified patient spatial data sets which ensure the risk of re-identification is very small.ResultsOur LP model was applied to a data set of 11,740 postal codes in the City of Ottawa, Canada. On this data set we demonstrated the limitations of the previous LP model, in that it produces improbable results, and showed how our extensions to deal with small areas allows the de-identification of the whole data set.ConclusionsThe LP model described in this study can be used to de-identify geospatial information for areas with small populations with minimal distortion to postal codes. Our LP model can be extended to include other information, such as age and gender.

Highlights

  • IntroductionA linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail (e.g., geocodes such as ZIP or postal codes, census blocks, and locations on maps) while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small

  • A linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small

  • Geographical identifiers are fundamental to the practice of spatial epidemiology [5] and are key components of the public health professional’s toolbox [6]

Read more

Summary

Introduction

A linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail (e.g., geocodes such as ZIP or postal codes, census blocks, and locations on maps) while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small. The LP model determines the transition probability from an original location of a patient to a new randomized location. It has a limitation for the cases of areas with a small population (e.g., median of 10 people in a ZIP code). Patients’ geographical identifiers (e.g., geocodes such as postal/ZIP codes, street addresses and locations on maps) are useful for health research and public health purposes [1,2,3,4]. Similar laws exist in Canada where de-identification is required for the disclosure of health information without consent [12,13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call