Discovering Text Patterns by a New Graphic Model

Minhua Huang,Robert M Haralick

doi:10.1007/978-3-642-23199-5_32

Abstract

We propose a probabilistic graphical model that works for recognizing three types of text patterns in a sentence: noun phrases; the meaning of an ambiguous word; and semantic arguments of a verb. The model has an unique mathematical expression and graphical representation compared with existing graphic models such as CRFs, HMMs, and MEMMs. In our model, a sequence of optimal categories for a sequence of symbols is determined by finding the optimal category for each symbol independently. Two consequences follows. First, it does not need to employ dynamic programming. The on-line time complexity and memory complexity are reduced. Moreover, the ratio of misclassification will be decreased. Experiments conducted on standard data sets show good results. For instance, our method achieves an average precision of 97.7% and an average recall of 98.8% for recognizing noun phrases on WSJ data from Penn Treebank; an average accuracy of 81.12% for recognizing the six sense word 'line'; an average precision of 92.96% and an average of recall of 94.94% for classifying semantic argument boundaries of a verb of a sentence on WSJ data from Penn Treebank and PropBank. The performance of each task surpasses or approaches the state-of-art level.

Full Text