Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning

Minlong Peng,Xuanjing Huang,Xiaoyu Xing,Qi Zhang,Jinlan Fu

doi:10.18653/v1/p19-1231

Abstract

In this work, we explore the way to perform named entity recognition (NER) using only unlabeled data and named entity dictionaries. To this end, we formulate the task as a positive-unlabeled (PU) learning problem and accordingly propose a novel PU learning algorithm to perform the task. We prove that the proposed algorithm can unbiasedly and consistently estimate the task loss as if there is fully labeled data. A key feature of the proposed method is that it does not require the dictionaries to label every entity within a sentence, and it even does not require the dictionaries to label all of the words constituting an entity. This greatly reduces the requirement on the quality of the dictionaries and makes our method generalize well with quite simple dictionaries. Empirical studies on four public NER datasets demonstrate the effectiveness of our proposed method. We have published the source code at \url{https://github.com/v-mipeng/LexiconNER}.

Highlights

Named Entity Recognition (NER) is concerned with identifying named entities, such as person, location, product and organization names in unstructured text
We explore the way to perform named entity recognition (NER) using only unlabeled data and named entity dictionaries, which are relatively easier to obtain compared with labeled data
We evaluate the effectiveness of our proposed method on four NER datasets

Summary

Introduction

Named Entity Recognition (NER) is concerned with identifying named entities, such as person, location, product and organization names in unstructured text. When using the dictionary to perform data labeling, we can only obtain some entity words and a bunch of unlabeled data comprising of both entity and non-entity words In this case, the conventional supervised or semi-supervised learning algorithms are not suitable, since they usually require labeled data of all classes. Since words labeled by the dictionary only cover part of entities, it cannot fully reveal data distribution of entity words To deal with this problem, we propose an adapted method, motivated by the AdaSampling algorithm (Yang et al, 2017), to enrich the dictionary. Contributions of this work can be summarized as follows: 1) We proposed a novel PU learning algorithm to perform the NER task using only unlabeled data and named entity dictionaries. 2) We proved that the proposed algorithm can unbiasedly and consistently estimate the task loss as if there is fully labeled data, under the assumption that the entities found out by the dictionary can reveal the distribution of entities. 3) To make the above assumption hold as far as possible, we propose an adapted method, motivated by the AdaSampling algorithm, to enrich the dictionary. 4) We empirically prove the effectiveness of our proposed method with extensive experimental studies on four NER datasets

Risk Minimization

Unbiased Positive-Unlabeled learning

Consistent Positive-Unlabeled Learning

Dictionary-based NER with PU Learning

Notations

Label Assignment Mechanism

2: Result: partial labeled sentence

Build PU Learning Classifier

Experiments

Adapted PU Learning for NER

Compared Methods

Datasets

Build Named Entity Dictionary

Results

Related Work

Conclusion

A Proof of Theorem 1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 98	License type: cc-by

Similar Papers

Claim Verification under Positive Unlabeled Learning
Fan Yang ... Eduard Dragut
-
Fan Yang, et. al.Fan Yang ... Eduard Dragut
07 Dec 2020
07 Dec 2020

Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers
Weixin Xie ... Chengkui Zhao
Current Bioinformatics | VOL. 19
Weixin Xie, et. al.Weixin Xie ... Chengkui Zhao
01 Sep 2024
Current Bioinformatics | VOL. 19

Positive-unlabeled learning in bioinformatics and computational biology: a brief review.
Fuyi Li ... Lachlan J M Coin
Briefings in Bioinformatics | VOL. 23
Fuyi Li, et. al.Fuyi Li ... Lachlan J M Coin
03 Nov 2021
Briefings in Bioinformatics | VOL. 23

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.
Hyebin Song ... Bennett J Bremer
Cell systems | VOL. 12
Hyebin Song, et. al.Hyebin Song ... Bennett J Bremer
18 Nov 2020
Cell systems | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers