Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

Mike Tian-Jian Jiang,Chan-Hung Kuo,Ting-Hao Yang,Wen-Lian Hsu

doi:10.1109/nlpke.2011.6138229

Abstract

This work proposed a unified view of several unsupervised feature selection based on frequent strings that improve conditional random fields (CRF) model for Chinese word segmentation (CWS). These features include character-based n-gram (CNG), accessor variety based string (AVS), term-contributed frequency (TCF), and term-contributed boundary (TCB), with a specific manner of boundary overlapping. For the experiment, the baseline is the 6-tag, a state-of-the-art labeling scheme of CRF-based CWS; and the data set is acquired from SIGHAN CWS bakeoff 2005 and SIGHAN CWS 2010. The experiment results show that all of those features improve the performance of the baseline system in terms of recall, precision, and their harmonic average as F 1 measure score, on both accuracy (F) and out-of-vocabulary recognition (F OOV ). In particular, this work presents a novel feature selection approach of the compound feature “AVS+TCB” that outperforms other types of features for CRF-based CSW in terms of F and F OOV .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Enhancement of Feature Engineering for Conditional Random Field Learning in Chinese Word Segmentation Using Unlabeled Data
...
-
, et. al. ...
01 Sep 2012
01 Sep 2012

GeoBERTSegmenter: Word Segmentation of Chinese Texts in the Geoscience Domain Using the Improved BERT Model
Dongqi Wei ... Kai Ma
Earth and Space Science | VOL. 9
Dongqi Wei, et. al.Dongqi Wei ... Kai Ma
01 Oct 2022
Earth and Space Science | VOL. 9

A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation
Wei Jiang ... Yan Tang
-
Wei Jiang, et. al.Wei Jiang ... Yan Tang
01 Jan 2020
01 Jan 2020

Chinese Word Segmentation and Recognition Based on Separable Convolution Bidirectional Long Short-Term Memory and Feature Point
...
-
, et. al. ...
18 Dec 2020
18 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

Abstract

Talk to us

Similar Papers