Improved Generalization for Secure Data Publishing

Saba Yaseen,Adeel Anjum,Naveed Ahmad,Syed M Ali Abbas,Tanzila Saba,Basit Shahzad,Saif Ur Rehman Malik,Ali Kashif Bashir,Abid Khan

doi:10.1109/access.2018.2828398

Abstract

In data publishing, privacy and utility are essential for data owners and users respectively, which cannot coexist well. This incompatibility puts the data privacy researchers under an obligation to find newer and reliable privacy preserving tradeoff-techniques. Data providers like many public and private organizations (e.g. hospitals and banks) publish microdata of individuals for various research purposes. Publishing microdata may compromise the privacy of individuals. To prevent the privacy of individuals, data must be published after removing personal identifiers like name and social security numbers. Removal of the personal identifiers appears as not enough to protect the privacy of individuals. $K$ -anonymity model is used to publish microdata by preserving the individual’s privacy through generalization. There exist many state-of-the-arts generalization-based techniques, which deal with pre-defined attacks like background knowledge attack, similarity attack, probability attack and so on. However, existing generalization-based techniques compromise the data utility while ensuring privacy. It is an open question to find an efficient technique that is able to set a trade-off between privacy and utility. In this paper, we discussed existing generalization hierarchies and their limitations in detail. We have also proposed three new generalization techniques including conventional generalization hierarchies, divisors based generalization hierarchies and cardinality-based generalization hierarchies. Extensive experiments on the real-world dataset acknowledge that our technique outperforms among the existing techniques in terms of better utility.

Full Text