Abstract

Public availability of electronic health records raises major privacy concerns, as that data contains confidential personal information of individuals. Publishing such data must be accompanied by appropriate privacy-preserving techniques to avoid or at least minimize privacy breaches. The task of privacy preservation becomes even more challenging when the data have multiple sensitive attributes (SAs). Privacy risks increase even further when an individual has multiple records (1:M) in a dataset, a rather typical situation with electronic health records (EHRs). To overcome these privacy issues, the methodologies known as 1:M generalization and l-anatomy have been proposed by the research community. However, these models fail to provide optimal privacy protection, data utility and security against certain types of attacks, such as gender-specific SA attacks. In this paper, we propose a generic 1:M data privacy model, called G-model, which provides guaranteed data privacy with high data utility and no information loss. Our G-model maintains separate groups and caches of male and female SAs, thus protecting privacy against gender-specific SA attacks. Furthermore, G-model avoids generalization, thus providing high data utility with no information loss. Experiments performed on three real-world datasets (Adult, Informs, and YouTube datasets) have shown that the proposed model is more efficient and better at privacy protection than the existing models from the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call