Abstract

The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken into account when anonymizing data. Even though specialized software such as sdcMicro exists, it is often difficult for nonexperts in a particular software and without programming skills to actually anonymize datasets without an appropriate app. The presented app is not restricted to apply disclosure limitation techniques but rather facilitates the entire anonymization process. This interface allows uploading data to the system, modifying them and to create an object defining the disclosure scenario. Once such a statistical disclosure control (SDC) problem has been defined, users can apply anonymization techniques to this object and get instant feedback on the impact on risk and data utility after SDC methods have been applied. Additional features, such as an Undo Button, the possibility to export the anonymized dataset or the required code for reproducibility reasons, as well its interactive features, make it convenient both for experts and nonexperts in R—the free software environment for statistical computing and graphics—to protect a dataset using this app.

Highlights

  • Various anonymization software tools have been made available in the past

  • One of the most feature-rich is sdcMicro [1,2], an R package for data anonymization optimized for large datasets

  • One of the first graphical user interfaces was provided via the software μ-Argus [3]

Read more

Summary

Introduction

One of the most feature-rich is sdcMicro [1,2], an R package for data anonymization optimized for large datasets. For users comfortable with using R, this package provides a tool for the application of a comprehensive suite of methods commonly used and described in literature on disclosure control. The application of these methods proved to be difficult for nonexperts in R to create secure and anonymous datasets. A graphical user interface in this area allows to access and apply methods, but it helps to integrate the entire workflow and anonymization process on data anonymization and offers additional tools and user guidance. Several graphical user interfaces in this area have been developed in the past and for comparison reasons, we want to outline the most prominent ones

Outline and Brief Comparison of Graphical User Interfaces for SDC
Specific Features of sdcApp
Outline
Getting Started
Getting Data into the System
Modify and Analyze Microdata
Anonymize
Defining a SDC Problem
Anonymization of Categorical Data
Anonymization of Continuous Data
Risk Measures
Visualizations
Numerical risk measures
Export Data
Reproducibility
Conclusions
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call