A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

Chengxiang Zhai,John Lafferty

doi:10.1145/3130348.3130377

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

Chengxiang Zhai, John Lafferty

https://doi.org/10.1145/3130348.3130377

Copy DOI

Journal: ACM SIGIR Forum	Publication Date: Aug 2, 2017
Citations: 1091

Affiliation: Carnegie Mellon University

#Language Model #Language Model Estimation + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: ACM SIGIR Forum

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.