Practical anonymization for data streams: z-anonymity and relation with k-anonymity

Nikhil Jha,Luca Vassio,Martino Trevisan,Emilio Leonardi,Marco Mellia

doi:10.1016/j.peva.2022.102329

Nikhil Jha, Luca Vassio + Show 3 more

Open Access

https://doi.org/10.1016/j.peva.2022.102329

Copy DOI

Abstract

With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, k-anonymity provides a simple and easy-to-understand protection. However, k-anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high.In this paper, we study a novel anonymization property called z-anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of z-anonymity is to release such attribute about a user only if at least z−1 other users have exposed the same attribute in a past time window. Depending on the value of z, the output stream results k-anonymized with a certain probability. To this end, we present a probabilistic model to map the z-anonymity into the k-anonymity property. The model is not only helpful in studying the z-anonymity property, but also general enough to evaluate the probability of achieving k-anonymity in data streams, resulting in a generic contribution.

Full Text