Abstract

K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not practically usable over a large number of attributes in a “Big” dataset, i.e., a dataset drawing from Big Data. To address this significant shortcoming, we introduce and evaluate KGen, an approach to K-anonymity featuring meta-heuristics, specifically, Genetic Algorithms to compute a permutation of the dataset which is both K-anonymized and still usable for further processing, e.g., for private-by-design analytics. KGen promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. KGen allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions compatible with smaller datasets, KGen works properly also over Big datasets while still providing a good-enough K-anonymized but still processable dataset. Evaluation results show how our approach can still work efficiently on a real world dataset, provided by Dutch Tax Authority, with 47 attributes (i.e., the columns of the dataset to be anonymized) and over 1.5K+ observations (i.e., the rows of that dataset), as well as on a dataset with 97 attributes and over 3942 observations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call