The Importance of Length Normalization for XML Retrieval

Jaap Kamps,Maarten De Rijke,Börkur Sigurbjörnsson

doi:10.1007/s10791-005-0750-7

Abstract

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Importance of Length Normalization for XML Retrieval

Abstract

Talk to us

Similar Papers

More From: Information Retrieval

Lead the way for us

Journal: Information Retrieval	Publication Date: Dec 1, 2005
Citations: 51

Similar Papers

Length normalization in XML retrieval
Jaap Kamps ... Maarten De Rijke
-
Jaap Kamps, et. al.Jaap Kamps ... Maarten De Rijke
25 Jul 2004
25 Jul 2004

Examining topic shifts in content-oriented XML retrieval
Elham Ashoori ... Theodora Tsikrika
International Journal on Digital Libraries | VOL. 8
Elham Ashoori, et. al.Elham Ashoori ... Theodora Tsikrika
27 Jul 2007
International Journal on Digital Libraries | VOL. 8

Using Topic Shifts in XML Retrieval at INEX 2006
Elham Ashoori ... Mounia Lalmas
-
Elham Ashoori, et. al.Elham Ashoori ... Mounia Lalmas
17 Dec 2006
Using Topic Shifts in XML Retrieval at INEX 2006
Elham Ashoori ... Mounia Lalmas

Using Topic Shifts for Focussed Access to XML Repositories
Elham Ashoori ... Mounia Lalmas
-
Elham Ashoori, et. al.Elham Ashoori ... Mounia Lalmas
02 Apr 2007
02 Apr 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Importance of Length Normalization for XML Retrieval

Abstract

Talk to us

Similar Papers

More From: Information Retrieval