Abstract

We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times—the goal of the NP-hard k-ANONYMITY problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. Building on previous work and coping with corresponding deficiencies, we describe an enhanced k-anonymization problem called PATTERN-GUIDED k-ANONYMITY, where the users specify in which combinations suppressions may occur. In this way, the user of the anonymized data can express the differing importance of various data features. We show that PATTERN-GUIDED k-ANONYMITY is NP-hard. We complement this by a fixed-parameter tractability result based on a “data-driven parameterization” and, based on this, develop an exact integer linear program (ILP)-based solution method, as well as a simple, but very effective, greedy heuristic. Experiments on several real-world datasets show that our heuristic easily matches up to the established “Mondrian” algorithm for k-ANONYMITY in terms of the quality of the anonymization and outperforms it in terms of running time.

Highlights

  • Making a matrix k-anonymous, that is, each row has to occur at least k times, is a classic model for data privacy [1,2]

  • Motivated by this computational intractability result, we develop an exact algorithm that solves PATTERN -G UIDED k-A NONYMITY in

  • O(2tp t6 p5 m + nm) time for an n × m input matrix M, p pattern vectors, and the number of different rows in M being t. This shows that PATTERN -G UIDED k-A NONYMITY is fixed-parameter tractable for the combined parameter (t, p) and can be solved in linear time if t and p take constant values. (The fundamental idea behind parameterized complexity analysis [18,19,20] is, given a computationally hard problem Q to identify a parameter, for Q and to determine whether size-s instances of Q can be solved in f (`) · sO(1) time, where f is an arbitrary computable function.) This result appears to be of practical interest only in special cases (“small” values for t and p are needed)

Read more

Summary

Introduction

Making a matrix k-anonymous, that is, each row has to occur at least k times, is a classic model for (combinatorial) data privacy [1,2]. (The fundamental idea behind parameterized complexity analysis [18,19,20] is, given a computationally hard problem Q to identify a parameter (typically, a positive integer or a tuple of positive integers), for Q and to determine whether size-s instances of Q can be solved in f (`) · sO(1) time, where f is an arbitrary computable function.) This result appears to be of practical interest only in special cases (“small” values for t and p are needed) It paves the way for a formulation of an integer linear program for PATTERN -G UIDED k-A NONYMITY that exactly solves moderate-size instances of PATTERN -G UIDED k-A NONYMITY in reasonable time. Our empirical findings strongly indicate that, even when neglecting the aspect of potentially stronger expressiveness on the data user side provided by PATTERN -G UIDED k-A NONYMITY, in combination with the greedy algorithm, it allows for high-quality and very fast data anonymization, being comparable in terms of anonymization quality with the established Mondrian algorithm [21], but significantly outperforming it in terms of time efficiency

Complexity and Algorithms
Parameterized Complexity
ILP Formulation
Greedy Heuristic
5: Assign all compatible rows of M to R0
Implementation and Experiments
Implementation Setup
Quality Criteria
Evaluation
Conclusions
Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.