Abstract

This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call