Abstract
The automated discovery of privacy vulnerabilities in large datasets containing person-specific information is an important first step in the privacy-preserving data publishing process and an area of increased interest for commercial data masking offerings. In this paper, we describe Identification of Privacy Vulnerabilities (IPV), a scalable system for automatically analyzing datasets to expose privacy vulnerabilities. IPV provides data owners with a wealth of methods for analyzing their data by offering state-of-the-art algorithms for 1) computing the direct identifiers and the quasi-identifiers of a dataset, as the single attributes and the minimal combinations of attributes, respectively, that lead to few records; 2) calculating the vulnerability index associated with a dataset, by reporting the cardinality of the smallest group of records that share the same values for each combination of attributes; and 3) reporting the specific records in a dataset that contain a combination of unique or rare values. All of these algorithms operate in a parallel, massively multi-threaded fashion and support various hardware configurations, spanning from commodity machines to multi-CPU multi-core nodes in cluster environments. After describing the system, we discuss the algorithms that are currently supported by IPV and provide some examples of their workings. We conclude this paper with a discussion on promising directions for future research in this area that will lead to the improvement of IPV.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.