Data Anonymization through Collaborative Multi-view Microaggregation

Sarah Zouinina,Abdelouahid Lyhyaoui,Nicoleta Rogovschi,Younès Bennani

doi:10.1515/jisys-2020-0026

Abstract

Abstract The interest in data anonymization is exponentially growing, motivated by the will of the governments to open their data. The main challenge of data anonymization is to find a balance between data utility and the amount of disclosure risk. One of the most known frameworks of data anonymization is k-anonymity, this method assumes that a dataset is anonymous if and only if for each element of the dataset, there exist at least k − 1 elements identical to it. In this paper, we propose two techniques to achieve k-anonymity through microaggregation: k-CMVM and Constrained-CMVM. Both, use topological collaborative clustering to obtain k-anonymous data. The first one determines the k levels automatically and the second defines it by exploration. We also improved the results of these two approaches by using pLVQ2 as a weighted vector quantization method. The four methods proposed were proven to be efficient using two data utility measures, the separability utility and the structural utility. The experimental results have shown a very promising performance.

Highlights

Nowadays, data is used in every aspect of the human life
We propose two techniques to achieve k-anonymity through microaggregation: k-CMVM and Constrained-CMVM
The first using the prototypes of the Best Matching Unit (BMU)(k-CMVM) and the second uses the linear mixture of models(Constrained CMVM)

Summary

Introduction

Data is used in every aspect of the human life. Data is collected by sensors, social networks, mobile applications and connected objects to treat it, explore it, transform it and learn from it. Approaches were mainly based on the randomization method which consists of adding noise to data [1]. The risk of data privacy breach using randomization was overtaken by the emergence of the k-anonymization method [38]. This group based anonymization method outputs a dataset containing at least k identical records and the anonymization is achieved by firstly removing the key-identifiers like the name and the address and secondly by generalizing and/or suppressing the pseudo-identifiers which are for example: the date of birth, the ZIP code, the gender and the age. At the end of the topological learning, the "similar" data will be collected in clusters, corresponding to the sets of similar patterns.

Fundamental background of the proposed approaches

Multi-view Collaborative Learning

Proposed Anonymization Approaches

Pre-Anonymization Step

Constrained CMVM

Fine tuning

Incorporating Discriminative Power

Datasets

Utility Measures and Statistical Analysis

Davies Bouldin Index

Silhouette Index

Calinski Harabasz Index

Structural Utility using the Earth Mover’s Distance

Preserving combined utility

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Intelligent Systems	Publication Date: Oct 2, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Data Anonymization through Collaborative Multi-view Microaggregation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Systems

Lead the way for us

Similar Papers

Questioning the Limits of Genomic Privacy
J.J Nietfeld ... Bartha M Knoppers
The American Journal of Human Genetics | VOL. 91
J.J Nietfeld, et. al.J.J Nietfeld ... Bartha M Knoppers
01 Sep 2012
The American Journal of Human Genetics | VOL. 91

Application of Bat Algorithm for Data Anonymization
Manas Kumar Yogi ... Yamuna Mundru
Journal of Electronics and Informatics | VOL. 5
Manas Kumar Yogi, et. al. Manas Kumar Yogi ... Yamuna Mundru
01 Sep 2023
Journal of Electronics and Informatics | VOL. 5

A Mondrian-based Utility Optimization Model for Anonymization
Yavuz Canbay ... Yilmaz Vural
-
Yavuz Canbay, et. al.Yavuz Canbay ... Yilmaz Vural
01 Sep 2019
01 Sep 2019

Data anonymization evaluation for big data and IoT environment
Li Shan Cang ... Prosanta Gope
Information Sciences | VOL. 605
Li Shan Cang, et. al.Li Shan Cang ... Prosanta Gope
13 May 2022
Information Sciences | VOL. 605

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Anonymization through Collaborative Multi-view Microaggregation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Systems