Dutch Named Entity Recognition and De-Identification Methods for the Human Resource Domain

Chaïm Van Toledo,Marco Spruit,Friso Van Dijk

doi:10.5121/ijnlc.2020.9602

Abstract

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them anonymisation. In this paper, we evaluate the current Dutch text de-identification methods for the HR domain in four steps. First, by updating one of these methods with the latest named entity recognition (NER) models. The result is that the NER model based on the CoNLL 2002 corpus in combination with the BERTje transformer give the best combination for suppressing persons (recall 0.94) and locations (recall 0.82). For suppressing gender, DEDUCE is performing best (recall 0.53). Second NER evaluation is based on both strict de-identification of entities (a person must be suppressed as a person) and third evaluation on a loose sense of de-identification (no matter what how a person is suppressed, as long it is suppressed). In the fourth and last step a new kind of NER dataset is tested for recognising job titles in tezts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dutch Named Entity Recognition and De-Identification Methods for the Human Resource Domain

Abstract

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing

Lead the way for us

Journal: International Journal on Natural Language Computing	Publication Date: Dec 30, 2020
Citations: 1

Similar Papers

Evaluating Dutch Named Entity Recognition and De-Identification Methods in the Human Resource Domain
Chaïm Van Toledo ... Marco Spruit
-
Chaïm Van Toledo, et. al.Chaïm Van Toledo ... Marco Spruit
28 Nov 2020
28 Nov 2020

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

Named Entity Recognition Model Based on TextCNN-BiLSTM-CRF with Chinese Text Classification
Ji-Ru Zhang Ji-Ru Zhang ... Bo He Ji-Ru Zhang
電腦學刊 | VOL. 33
Ji-Ru Zhang Ji-Ru Zhang, et. al.Ji-Ru Zhang Ji-Ru Zhang ... Bo He Ji-Ru Zhang
01 Apr 2022
電腦學刊 | VOL. 33

A Federated Named Entity Recognition Model with Explicit Relation for Power Grid
Jingtang Luo ... Jie Xu
Computers, Materials & Continua | VOL. 75
Jingtang Luo, et. al.Jingtang Luo ... Jie Xu
01 Jan 2023
Computers, Materials & Continua | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dutch Named Entity Recognition and De-Identification Methods for the Human Resource Domain

Abstract

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing