Chinese Unknown Word Identification Based on Local Bigram Model

Zhuoran Wang,Ting Liu

doi:10.1142/s0219427905001286

Chinese Unknown Word Identification Based on Local Bigram Model

Zhuoran Wang, Ting Liu

Open Access

https://doi.org/10.1142/s0219427905001286

Copy DOI

Journal: International Journal of Computer Processing of Languages	Publication Date: Sep 1, 2005
Citations: 13

Affiliation: Harbin Institute of Technology

#Bigram Model #Unknown Word + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.

Full Text