Abstract

Preserving privacy in the presence of adversary’s background knowledge is very important in data publishing. The k-anonymity model, while protecting identity, does not protect against attribute disclosure. One of strong refinements of k-anonymity, β-likeness, does not protect against identity disclosure. Neither model protects against attacks featured by background knowledge. This research proposes two approaches for generating k-anonymous β-likeness datasets that protect against identity and attribute disclosures and prevent attacks featured by any data correlations between QIs and sensitive attribute values as the adversary’s background knowledge. In particular, two hierarchical anonymization algorithms are proposed. Both algorithms apply agglomerative clustering techniques in their first stage in order to generate clusters of records whose probability distributions extracted by background knowledge are similar. In the next phase, k-anonymity and β-likeness are enforced in order to prevent identity and attribute disclosures. Our extensive experiments demonstrate that the proposed algorithms outperform other state-of-the-art anonymization algorithms in terms of privacy and data utility where the number of unpublished records in our algorithms is less than that of the others. As well-known information loss metrics fail to measure precisely the imposed data inaccuracies stemmed from the removal of records that cannot be published in any equivalence class. This research also introduces an extension into the Global Certainty Penalty metric that considers unpublished records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call