Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

Chien-Lung Chou,Shin-Yi Wu,Chia-Hui Chang

doi:10.3115/v1/w14-6205

Abstract

Named entity extraction is a fundamental task for many knowledge engineering applications. Existing studies rely on annotated training data, which is quite expensive when used to obtain large data sets, limiting the effectiveness of recognition. In this research, we propose an automatic labeling procedure to prepare training data from structured resources which contain known named entities. While this automatically labeled training data may contain noise, a self-testing procedure may be used as a follow-up to remove low-confidence annotation and increase the extraction performance with less training data. In addition to the preparation of labeled training data, we also employed semi-supervised learning to utilize large unlabeled training data. By modifying tri-training for sequence labeling and deriving the proper initialization, we can further improve entity extraction. In the task of Chinese personal name extraction with 364,685 sentences (8,672 news articles) and 54,449 (11,856 distinct) person names, an F-measure of 90.4% can be achieved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection Methods
Chien-Lung Chou ... Chia-Hui Chang
-
Chien-Lung Chou, et. al.Chien-Lung Chou ... Chia-Hui Chang
01 Jan 2014
01 Jan 2014

Boosted Web Named Entity Recognition via Tri-Training
Chien-Lung Chou ... Ya-Yun Huang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16
Chien-Lung Chou, et. al.Chien-Lung Chou ... Ya-Yun Huang
14 Oct 2016
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 16

Thai personal named entity extraction without using word segmentation or POS tagging
P Sutheebanjard ... W Premchaiswadi
-
P Sutheebanjard, et. al.P Sutheebanjard ... W Premchaiswadi
01 Oct 2009
01 Oct 2009

Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents
Sumam Francis ... Jordy Van Landeghem
Information | VOL. 10
Sumam Francis, et. al.Sumam Francis ... Jordy Van Landeghem
26 Jul 2019
Information | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction

Abstract

Talk to us

Similar Papers