Word clustering and disambiguation based on co-occurrence data

Hang Li

doi:10.1017/s1351324902002838

Abstract

We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and conducting syntactic disambiguation by using the acquired word classes. We view the clustering problem as that of estimating a class-based probability distribution specifying the joint probabilities of word pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability model. Our clustering method is a natural extension of that proposed in Brown, Della Pietra, deSouza, Lai and Mercer (1992). We next propose a syntactic disambiguation method which combines the use of automatically constructed word classes and that of a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 88.2%, which compares favorably against the accuracies obtained by the state-of-the-art disambiguation methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Natural Language Engineering	Publication Date: Mar 1, 2002
Citations: 36	License type: cc-by-nc-sa

R Discovery Prime

R Discovery Prime

Word clustering and disambiguation based on co-occurrence data

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Similar Papers

Word clustering and disambiguation based on co-occurrence data
Hang Li ... Naoki Abe
-
Hang Li, et. al.Hang Li ... Naoki Abe
01 Jan 1998
01 Jan 1998

Word clustering and disambiguation based on co-occurrence data
Hang Li ... Naoki Abe
-
Hang Li, et. al.Hang Li ... Naoki Abe
01 Jan 1998
01 Jan 1998

구간 분할과 논항정보를 이용한 구문분석시스템 구현에 관한 연구
Yong Uk Park ... Hyuk Chul Kwon
Journal of Korea Multimedia Society | VOL. 16
Yong Uk Park, et. al.Yong Uk Park ... Hyuk Chul Kwon
31 Mar 2013
Journal of Korea Multimedia Society | VOL. 16

An Analysis of Students' Ability in Distinguishing Lexical and Structural Ambiguity in English Sentences at Second Grade of SMA 1 Labuapi in the Academic Year 2016-2017
Irwandi Irwandi
Linguistics and Elt Journal | VOL. 5
Irwandi IrwandiIrwandi Irwandi
05 Mar 2019
Linguistics and Elt Journal | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word clustering and disambiguation based on co-occurrence data

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering