Abstract

Noun phrases are commonly used for generating index terms for information retrieval systems. Therefore, we need an effective noun phrase extraction method. In this paper, we propose an approach to extract maximal noun phrases from Chinese text. Although previous studies have been proposed to extract noun phrases, most of them are only applicable to Western languages. To the best of our knowledge, very few has handled Chinese text. Many existing approaches for Western languages made use of statistical methods. However, due to the complicated structure of maximal Chinese noun phrase, pure statistical approaches are not effective. We attempt to improve the performance of a statistical method by integrating it with the transformation-based error-driven learning (TEL) technique. Our methodology includes two modules. The first module applies a statistical method to extract Chinese noun phrases. The performance of this approach, in terms of precision and recall, is investigated. The second module applies the TEL algorithm to further refine the output of the first module. The TEL algorithm automatically learns a set of transformation rules to fix the errors that are obtained through comparing the output of the first module with the correctly annotated corpus. The learned rules can be applied to sentences in any corpus one by one to correct the errors. The TEL algorithm is shown to be effective in improving the precision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.