Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

Lidan Zhang,Kwop-Ping Chan

doi:10.1145/2334801.2334803

Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

Lidan Zhang, Kwop-Ping Chan

https://doi.org/10.1145/2334801.2334803

Copy DOI

Journal: ACM Transactions on Asian Language Information Processing	Publication Date: Sep 1, 2012
Citations: 2

Affiliation: University of Hong Kong

#Bayesian HMM #Chinese Treebank + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank.

Full Text