Utility-driven assessment of anonymized data via clustering

Maria Eugénia Ferrão,Paula Prata,Paulo Fazendeiro

doi:10.1038/s41597-022-01561-6

Maria Eugénia Ferrão, Paula Prata + Show 1 more

Open Access

https://doi.org/10.1038/s41597-022-01561-6

Copy DOI

Abstract

In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Data	Publication Date: Jul 30, 2022
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Utility-driven assessment of anonymized data via clustering

Abstract

Talk to us

Similar Papers

More From: Scientific Data

Lead the way for us

Similar Papers

Better Safe than Sorry - Implementing Reliable Health Data Anonymization.
Raffael Bild ...
Studies in health technology and informatics | VOL. 270
Raffael Bild, et. al.Raffael Bild ...
17 Jun 2020
Studies in health technology and informatics | VOL. 270

Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections
Josep Domingo-Ferrer ... David Sánchez
Synthesis Lectures on Information Security, Privacy, and Trust | VOL. 8
Josep Domingo-Ferrer, et. al.Josep Domingo-Ferrer ... David Sánchez
11 Jan 2016
Synthesis Lectures on Information Security, Privacy, and Trust | VOL. 8

Privacy-preserving trajectory data publishing by local suppression
Rui Chen ... Ke Wang
Information Sciences | VOL. 231
Rui Chen, et. al.Rui Chen ... Ke Wang
28 Jul 2011
Information Sciences | VOL. 231

An Evaluation of Anonymized Models and Ensemble Classifiers
Peerapong Vanichayavisalsakul ... Krerk Piromsopa
-
Peerapong Vanichayavisalsakul, et. al.Peerapong Vanichayavisalsakul ... Krerk Piromsopa
24 Oct 2018
24 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utility-driven assessment of anonymized data via clustering

Abstract

Talk to us

Similar Papers

More From: Scientific Data