Probabilistic models of information retrieval based on measuring the divergence from randomness

Gianni Amati,Cornelis Joost Van Rijsbergen

doi:10.1145/582415.582416

Abstract

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Probabilistic models of information retrieval based on measuring the divergence from randomness

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: Oct 1, 2002
Citations: 828

Similar Papers

A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering
Mourad Sarrouti ... Said Ouatik El Alaoui
Journal of Biomedical Informatics | VOL. 68
Mourad Sarrouti, et. al.Mourad Sarrouti ... Said Ouatik El Alaoui
07 Mar 2017
Journal of Biomedical Informatics | VOL. 68

A probabilistic justification for using tf×idf term weighting in information retrieval
Djoerd Hiemstra
International Journal on Digital Libraries | VOL. 3
Djoerd HiemstraDjoerd Hiemstra
01 Aug 2000
International Journal on Digital Libraries | VOL. 3

A topic‐based term frequency normalization framework to enhance probabilistic information retrieval
Fanghong Jian ... Jimmy X Huang
Computational Intelligence | VOL. 36
Fanghong Jian, et. al.Fanghong Jian ... Jimmy X Huang
20 Nov 2019
Computational Intelligence | VOL. 36

The role of variance in term weighting for probabilistic information retrieval
Warren R Greiff ... Jay M Ponte
-
Warren R Greiff, et. al.Warren R Greiff ... Jay M Ponte
04 Nov 2002
04 Nov 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probabilistic models of information retrieval based on measuring the divergence from randomness

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems