Lexicon expansion for latent variable grammars

Xiaodong Zeng,Derek F Wong,Lidia S Chao,Isabel Trancoso,Liangye He,Qiuping Huang

doi:10.1016/j.patrec.2014.01.010

Xiaodong Zeng, Derek F Wong + Show 4 more

https://doi.org/10.1016/j.patrec.2014.01.010

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study investigates the use of unlabeled data, i.e., raw texts, to strengthen latent variable probabilistic context-free grammars, in particular lexical models. A graph-based lexicon expansion approach is proposed to achieve this goal. It aims to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The proposed approach is based on a transductive graph-based label propagation technique. The approach builds k-nearest-neighbor (k-NN) similarity graphs over the words of labeled and unlabeled data, for propagating lexical emission probabilities. The intuition is that different word under similar syntactic environment should have approximate lexical emission distributions. The derived words, together with lexical emission probabilities, are incorporated into the parsing. This approach is very effective in parsing out-of-vocabulary (OOV) words. Empirical results for English, Chinese, and Portuguese revealed its effectiveness.

Full Text