Abstract

Traditional PageRank algorithm computes the weight for each hyper-linked document, which indicates the importance of a page, based on the in-links and out-links. This is an off-line and query independent process which suits a keyword based search strategy. However, owing to the problems like polynymy, synonymy etc.., existing in keyword based search, new methodologies for search like concept based search, semantic web based search etc., have been developed. Concept based search engines generally go in for content based ranking by imparting semantics to the web pages. While this approach is better than the keyword based ranking strategies, they do not consider the physical link structure between documents which is the basis of the successful PageRank algorithm. Hence, we made an attempt to combine the power of link structures with content information to suit the concept based search engines. Our main contribution includes, two modifications to the traditional PageRank Algorithm, both specifically to cater to the concept based search engines. Inspired by the topic sensitive PageRank algorithm, we have multiple PageRanks for a document, rather than just one for each document, as given in the traditional implementation of the PageRank algorithm. We have compared our methodologies with an existing concept based search engine's ranking methodology, and found that our modifications considerably improve the ranking of the conceptual search results. Furthermore, we performed statistical significance test and found out that our Version-2 modification to the PageRank algorithm is statistically significant in its P@5 performance compared to the baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call