Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool

Fabian Prasser,Florian Kohlmayer

doi:10.1007/978-3-319-23633-9_6

Abstract

The sharing of sensitive personal data has become a core element of biomedical research. To protect privacy, a broad spectrum of techniques must be implemented, including data anonymization. In this article, we present ARX, an anonymization tool for structured data which supports a broad spectrum of methods for statistical disclosure control by providing (1) models for analyzing re-identification risks, (2) risk-based anonymization, (3) syntactic privacy criteria, such as k-anonymity, l-diversity, t-closeness and δ-presence, (4) methods for automated and manual evaluation of data utility, and (5) an intuitive coding model using generalization, suppression and microaggregation. ARX is highly scalable and allows for anonymizing datasets with several millions of records on commodity hardware. Moreover, it offers a comprehensive graphical user interface with wizards and visualizations that guide users through different aspects of the anonymization process. ARX is not just a toolbox, but a fully-fledged application, meaning that all implemented methods have been harmonized and integrated with each other. It is well understood that balancing privacy and data utility requires user feedback. To facilitate this interaction, ARX is highly configurable and provides various methods for exploring the solution space.

Full Text