Limiting sensitive values in an anonymized table while reducing information loss via p‐proportion

Richard Dosselmann,Howard J Hamilton

doi:10.1002/spy2.202

Abstract

AbstractThe ‐proportion model bounds the proportion of sensitive values of a sensitive attribute in each equivalence class of an anonymized database table in order to limit the ability of a user to link an individual or entity to a sensitive value in that table. Nonsensitive values are not subject to any such constraints, which reduces the amount of anonymization needed to meet the requirements of this model. This leads to less information loss in an anonymized table. Anonymization is performed using an extension of the Mondrian algorithm that incorporates categorical attributes. Known as the adapted Mondrian algorithm, it generalizes a value of a categorical attribute to a set. Existing algorithms, by comparison, replace one value of a predefined hierarchy by another. The ‐proportion model is compared against the ()‐anonymity model using both the progressive local recoding and (adapted) Mondrian algorithms. Experiments demonstrate the advantage of ‐proportion and Mondrian over ()‐anonymity and progressive local recoding in terms of reduced information loss, measured using the normalized certainty penalty, discernibility metric, and classification metric.

Full Text