Abstract

Information on the web is growing exponentially. The unprecedented growth of available information coupled with the vast number of available online activities. It has introduced a new wrinkle to the problem of web search. It is difficult to retrieve relevant information. In this context search engines have become a valuable tool for users to retrieve relevant information. Finding relevant information according to user’s need is still a challenge. Various retrieval models have been proposed and empirically validated to find out relevant web pages related to user’s queries. The vector space model is one of the extensively used for web information retrieval. But this model ignores the importance of terms with respect to their position while calculating the weight to the terms. In this paper, new approach is proposed and validated based on vector space model, referred as Layered Vector Space model. In Layered Vector Space approach, the importance of terms with respect to their position is considered. The web document is conceptually segmented in N-layers considering the organization of the web document and the weights are assigned to terms appearing in different layers based on their occurrence within the document. The proposed layered vector space approach is compared with other token based similarity measures: vector space model, Jaccard similarity, Dice similarity, Pearson’s coefficient and PMI-IR General Terms Information Retrieval; Layered vector space model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call