An Investigation on Transformation-based Error-driven Learning Algorithm for Chinese Noun Phrase Extraction

Kam-Fai Wong,Timothy Kun-Chung Chan,Chun-Hung Cheng

doi:10.1142/s0219427901000308

Abstract

Noun phrases are commonly used for generating index terms for information retrieval systems. Therefore, we need an effective noun phrase extraction method. In this paper, we propose an approach to extract maximal noun phrases from Chinese text. Although previous studies have been proposed to extract noun phrases, most of them are only applicable to Western languages. To the best of our knowledge, very few has handled Chinese text. Many existing approaches for Western languages made use of statistical methods. However, due to the complicated structure of maximal Chinese noun phrase, pure statistical approaches are not effective. We attempt to improve the performance of a statistical method by integrating it with the transformation-based error-driven learning (TEL) technique. Our methodology includes two modules. The first module applies a statistical method to extract Chinese noun phrases. The performance of this approach, in terms of precision and recall, is investigated. The second module applies the TEL algorithm to further refine the output of the first module. The TEL algorithm automatically learns a set of transformation rules to fix the errors that are obtained through comparing the output of the first module with the correctly annotated corpus. The learned rules can be applied to sentences in any corpus one by one to correct the errors. The TEL algorithm is shown to be effective in improving the precision.

Full Text