Web Document Indexing and Retrieval

Byurhan Hyusein,Ahmed Patel

doi:10.1007/3-540-36456-0_62

Abstract

Web Document Indexing is an important part of every Search Engine (SE). Indexing quality has an overwhelming effect on retrieval effectiveness. A document index is a set of terms which show the contents (topic) of the document and helps in distinguishing a given document from other documents in the collection of documents. Small index size can lead to poor results and may miss some relevant items. Large index size allows retrieval of many useful documents along with a significant number of irrelevant ones and decreases the search speed and effectiveness of the searched item. Though the problem has been studied for many years there is still no algorithm to find the optimal index size and sets of index terms. This paper shows how different attributes of the web document (namely Title, Anchor and Emphasize) contribute to the average precision in the process of search. The experiments are done on the WT10g collection of a 1.69-million page corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Web Document Indexing and Retrieval

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Towards a theory of document learning
Lorraine M Purgailis Parker
Journal of the American Society for Information Science | VOL. 34
Lorraine M Purgailis ParkerLorraine M Purgailis Parker
01 Jan 1982
Journal of the American Society for Information Science | VOL. 34

Toward a task-based gold standard for evaluation of NP chunks and technical terms
Nina Wacholder ... Peng Song
-
Nina Wacholder, et. al.Nina Wacholder ... Peng Song
01 Jan 2003
01 Jan 2003

<title>Enriching text with images and colored light</title>
Dragan Sekulovski ... Steffen Pauws
-
Dragan Sekulovski, et. al.Dragan Sekulovski ... Steffen Pauws
27 Jan 2008
27 Jan 2008

A probability distribution model for information retrieval
S.K.M Wong ... Y.Y Yao
Information Processing & Management | VOL. 25
S.K.M Wong, et. al.S.K.M Wong ... Y.Y Yao
01 Jan 1989
Information Processing & Management | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Web Document Indexing and Retrieval

Abstract

Talk to us

Similar Papers