Abstract

Despite limited success, today's information retrieval (IR) systems are not intelligent or reliable. IR systems return poor search results when users formulate their information needs into incomplete or ambiguous queries (i.e., weak queries). Therefore, one of the main challenges in modern IR research is to provide consistent results across all queries by improving the performance on weak queries. However, existing IR approaches such as query expansion are not overly effective because they make little effort to analyze and exploit the meanings of the queries. Furthermore, word sense disambiguation approaches, which rely on textual context, are ineffective against weak queries that are typically short. Motivated by the demand for a robust IR system that can consistently provide highly accurate results, the proposed study implemented a novel topic detection that leveraged both the language model and structural knowledge of Wikipedia and systematically evaluated the effect of query disambiguation and topic-based retrieval approaches on TREC collections. The results not only confirm the effectiveness of the proposed topic detection and topic-based retrieval approaches but also demonstrate that query disambiguation does not improve IR as expected.

Highlights

  • Information retrieval (IR) has emerged as a central technology in modern society by enabling individuals to extend their ability to discover and obtain knowledge

  • The Wilcoxon signed ranks test performed on the results did not reject the null hypothesis (H1) that query disambiguation has no significant effect on retrieval performance for either the Blog queries or the High Accuracy Retrieval from Document (HARD) queries

  • For ambiguous queries from the HARD collection, Wilcoxon tests for two-tailed significance conducted between wsd_qe and no_wsd_qe and between wsd_qe and no_wsd_ir retained the null hypothesis, but at a higher p-value (p>0.2, N=10)

Read more

Summary

Introduction

Information retrieval (IR) has emerged as a central technology in modern society by enabling individuals to extend their ability to discover and obtain knowledge. The quality of queries has a profound impact on retrieval performance because users must formulate their information needs into queries. Queries that produce low retrieval performance on most IR systems are called weak queries or ineffective queries. In addition to poor query formulation due to the lack of domain knowledge, the problem of weak querying is intensified by the complexities of natural language such as polysemy. Previous research has found that polysemous words in queries can adversely affect automatic query expansion by introducing terms related to incorrect senses of the polysemes (Voorhees, 1994). To address problems of polysemy, IR researchers use techniques developed for word sense disambiguation (WSD) to identify the intended meaning of a given polyseme. The effective application of query disambiguation in IR is not a trivial task because the majority of the queries are short and unable to provide the adequate context required by traditional WSD methods. In consequence, previous studies that have examined the issue of query disambiguation report unsatisfactory results (Sanderson, 1994, 2000; Voorhees, 1993)

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.