Software Architecture for Document Anonymization

Horacio Vico,Daniel Calegari

doi:10.1016/j.entcs.2015.05.006

Abstract

Organizations often have a dilemma in relation to their documents: ensure confidentiality of the data or publish the information contained in them, for transparency, scientific interest, or other reasons. In this context is that document anonymization arises, i.e. the replacement of sensitive data in such a way that preserves the confidentiality of the documents without altering their value or usefulness. There are proposals for (semi)automatic anonymization, but they are often domain-specific or they partially address the problem. In this paper we present a software architecture for supporting document anonymization, which is based on the representation of the problem as a domain and platform independent configurable business process. In addition, we analyze the technological alternatives for implementing the architecture and we present a functional prototype applied to the domain of legal documents.

Full Text