Abstract

In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.