정보검색 성능 향상을 위한 단어 중의성 해소 모형에 관한 연구

Young-Mee Chung ,Yonggu Lee

doi:10.3743/kosim.2005.22.2.125

Abstract

This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved disambiguation accuracy. while the clustering performance of the EM algorithm is on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed precision achieving about improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

Full Text