Information Retrieval System for XML Documents

Kenji Hatano,Masatoshi Yoshikawa,Shunsuke Uemura,Hiroko Kinutani

doi:10.1007/3-540-46146-9_75

Abstract

In the research field of document information retrieval, the unit of retrieval results returned by IR systems is a whole document or a document fragment, like a paragraph in passage retrieval. IR systems based on the vector space model compute feature vectors of the units and calculate the similarities between the units and the query. However, the unit of retrieval results are not suitable for document information retrieval since they are not congruent with the information which users are searching for. Therefore, the unit of retrieval results should be a portion of the XML document, such as a chapter, section, or subsection. That is, we think the most important concern of document information retrieval is to define the unit of retrieval results, that is meaningful for users. It is easy to construct the appropriate portion of XML documents as retrieval results because XML is a standard document format on the Internet and because XML documents consist of contents and document structures. In this paper, we propose an effective IR system for XML documents that automatically defines an appropriate unit of retrieval results by analyzing the XML document structure. We performed experimental evaluations and verified the effectiveness of our XML IR system. In addition, we also defined new recall and precision measures for XML information retrieval in order to evaluate our XML IR system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information Retrieval System for XML Documents

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

What XML-IR Users May Want
Alan Woodley ... Shlomo Geva
-
Alan Woodley, et. al.Alan Woodley ... Shlomo Geva
17 Dec 2006
17 Dec 2006

Serial retrieval processes in the recovery of order information.
Brian Mcelree ... Barbara A Dosher
Journal of Experimental Psychology: General | VOL. 122
Brian Mcelree, et. al.Brian Mcelree ... Barbara A Dosher
01 Jan 1992
Journal of Experimental Psychology: General | VOL. 122

NLPX at INEX 2004
Alan Woodley ... Shlomo Geva
-
Alan Woodley, et. al.Alan Woodley ... Shlomo Geva
01 Jan 2004
NLPX at INEX 2004
Alan Woodley ... Shlomo Geva

Library Research Models: A Guide to Classification, Cataloging, and Computers: T. Mann. Oxford University Press, New York (1992). xii + 248 pp., $22.50, ISBN 0-19-508190-0.
Gregory A Crawford
Information Processing and Management | VOL. 31
Gregory A CrawfordGregory A Crawford
01 Mar 1995
Information Processing and Management | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Retrieval System for XML Documents

Abstract

Talk to us

Similar Papers