Challenges and Open Problems of Legal Document Anonymization

Gergely Márk Csányi,János Pál Vadász,Renátó Vági,Dániel Nagy,Tamás Orosz

doi:10.3390/sym13081490

Gergely Márk Csányi, János Pál Vadász + Show 3 more

Open Access

https://doi.org/10.3390/sym13081490

Copy DOI

Abstract

Data sharing is a central aspect of judicial systems. The openly accessible documents can make the judiciary system more transparent. On the other hand, the published legal documents can contain much sensitive information about the involved persons or companies. For this reason, the anonymization of these documents is obligatory to prevent privacy breaches. General Data Protection Regulation (GDPR) and other modern privacy-protecting regulations have strict definitions of private data containing direct and indirect identifiers. In legal documents, there is a wide range of attributes regarding the involved parties. Moreover, legal documents can contain additional information about the relations between the involved parties and rare events. Hence, the personal data can be represented by a sparse matrix of these attributes. The application of Named Entity Recognition methods is essential for a fair anonymization process but is not enough. Machine learning-based methods should be used together with anonymization models, such as differential privacy, to reduce re-identification risk. On the other hand, the information content (utility) of the text should be preserved. This paper aims to summarize and highlight the open and symmetrical problems from the fields of structured and unstructured text anonymization. The possible methods for anonymizing legal documents discussed and illustrated by case studies from the Hungarian legal practice.

Highlights

Published: 13 August 2021Digitalization of judicial systems is an important goal of the European Union [1].Sharing and making court decisions and different legal documents accessible online is a crucial part of this intention
Sweeney made a famous linking attack on the set of public health records collected by The National Association of Health Data Organizations (NAHDO) in many states where they had legislative mandates to collect hospital-level data (Figure 3)
In Hungary, the Act CXII of 2011 (InfoAct) states that the data subject’s rights after their death could be exercised either by a person appointed by the data subject during their life or a close relative

Summary

Introduction

Digitalization of judicial systems is an important goal of the European Union [1]. Sharing and making court decisions and different legal documents accessible online is a crucial part of this intention. In 2019, a group of researchers carried out a linking attack against anonymized legal cases in Switzerland They published a study where they presented that using artificial intelligence methods with big data collected from other publicly available databases, they could re-identify 84% of the people, being anonymized in this database, in less than an hour [19]. The current anonymization practice in many European Union countries means the masking of the names and other direct identifiers of the involved persons This process does not fulfill the requirements of the General Data Protection Regulation. These examples show that mathematical statistical analysis is important in filtering those unique events, that may serve as a primary identifier (e.g., the surgeon amputates the wrong leg) Those applications and services, which link the legal documents together with other databases, need a special care to consider the GDPR recommendations

Privacy and Anonymization

Privacy Models

Types of Privacy Attacks

Structure and Privacy Risks in Hungarian Legal Documents

Criticism of Current Regulation

Current Practice and Potential Risks

Datasets and Search Framework

Illustrative Examples

Quantifying Risk

The Threshold

Automatized Workflows for Pseudonymization

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Aug 13, 2021
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Challenges and Open Problems of Legal Document Anonymization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

ПОНЯТИЕ ЮРИДИЧЕСКОГО ДОКУМЕНТА В ПРОФЕССИОНАЛЬНОЙ СФЕРЕ
E.V Troshchenkova ... E.A Rudneva
Voprosy Kognitivnoy Lingvistiki | VOL. -
E.V Troshchenkova, et. al.E.V Troshchenkova ... E.A Rudneva
01 Jan 2023
Voprosy Kognitivnoy Lingvistiki | VOL. -

Application of Blockchain Technology in the Management of Legal Documents
Jitesh Choudhary ... Abhishek Shende
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Jitesh Choudhary, et. al. Jitesh Choudhary ... Abhishek Shende
06 Mar 2024
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Automatic Inference of Taxonomy Relationships Among Legal Documents
Irene Benedetto ... Luca Cagliero
-
Irene Benedetto, et. al.Irene Benedetto ... Luca Cagliero
01 Jan 2021
01 Jan 2021

Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model
Chun-Hsien Lin ... Pu-Jen Cheng
-
Chun-Hsien Lin, et. al.Chun-Hsien Lin ... Pu-Jen Cheng
27 Apr 2024
27 Apr 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Challenges and Open Problems of Legal Document Anonymization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry