Abstract

Data containing diagnosis codes are often derived from electronic health records and shared to enable large-scale, low-cost medical studies. However, the sharing of such data may lead to the disclosure of patients’ identities, which must be prevented to address privacy concerns and comply with worldwide legislation. To ensure that data privacy and utility are preserved, a utility-constrained anonymization approach can be enforced. This approach transforms a given dataset, so that the probability of identity disclosure, based on diagnosis codes, is limited and the data remain useful for intended studies. In this chapter, we provide a detailed discussion of the utility-constrained anonymization approach. Specifically, we explain how utility constraints, which model the requirements of intended studies, can be formulated and satisfied through data generalization or disassociation. Furthermore, we review two recently proposed algorithms that follow the utility-constrained approach and are the current state-of-the-art in terms of preserving data utility. We conclude this chapter by discussing several promising directions for future research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.