Parallel Online Ranking of Web Pages

Y.G Saffar,K.S Esmaili,M Ghodsi,H Abolhassani

doi:10.1109/aiccsa.2006.205075

Abstract

Modern search engines use link structure of the World Wide Web in order to gain better results for ranking the results of users' queries. One of the most popular ranking algorithms which is based on link analysis is HITS. It generates very accurate outputs but because of huge amount of online computations, this algorithm is relatively slow. In this paper we introduce PHITS, a parallelized version of the HITS algorithm that is suitable for working with huge web graphs in a reason- able time. For implementing this algorithm, we use WebGraph framework and we focus on parallelizing access to web graph as the main bottleneck in the HITS algorithm. I. INTRODUCTION Search technology is one of the most important reasons for success of the web. The huge amount of information available on the web, its high growth rate, and its unstructured nature, all increase the need for search engines with high performance and accurate results. One of the major components of each search engine is its ranking algorithm. Traditional Information Retrieval (IR) systems usually use some models like VMS (4) and compute rank of results using content similarity measures between user's query and retrieved documents. But in the context of the web, there are some problems with these approaches. For example, spamming may lead to inefficient ranking. Some methods have been proposed to encounter these problems most of which uses some implicit information which is embedded in the web graph. These methods are known as Link-Analysis based algorithms. PageRank (5) and HITS (Hyperlink Induced Topic Search) (1) are the most well known algorithms in this category. PageRank, which is used by Google for ranking its results, is an offline and query-independent ranking algorithm. This means that the ranking is independent of the specific queries of users and therefore can be done once and used for all of the upcoming queries. On the other hand, HITS is an online and query-dependent algorithm. Being query dependent makes HITS more precise but it has some disadvantages too. In fact, required online computations for this algorithm is too much and the response time of the search engine after submitting queries by users is not acceptable. To overcome this problem, in this paper we will exploit the parallel processing methods to improve the execution performance of the algorithm. The rest of this paper is organized as follows. In section II, link-analysis based algorithms in general and HITS as a special case are discussed. At the end of this section, some of the variations and improvements for the HITS algorithm that are suggested in the literature are also described. Implementing the HITS algorithm and its parallel version, PHITS, are discussed in sections III and IV respectively. Finally, last section of this paper contains conclusion and some ideas for future work in this topic.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel Online Ranking of Web Pages

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Effective utilization of page ranking and HITS in significant information retrieval
Shailendra G Pawar ... Pratiksha Natani
-
Shailendra G Pawar, et. al.Shailendra G Pawar ... Pratiksha Natani
01 Apr 2014
01 Apr 2014

Supervised HITS Algorithm for MEDLINE Citation Ranking
Ying Liu ... Yongjing Lin
-
Ying Liu, et. al.Ying Liu ... Yongjing Lin
01 Oct 2007
01 Oct 2007

Study on theme-drift of hyperlink-induced topic search algorithm
Qi Gao ... Yong-Ping Zhang
Journal of Computer Applications | VOL. 29
Qi Gao, et. al.Qi Gao ... Yong-Ping Zhang
28 Dec 2009
Journal of Computer Applications | VOL. 29

A novel method to predict essential proteins based on tensor and HITS algorithm
Zhihong Zhang ... Bihai Zhao
Human Genomics | VOL. 14
Zhihong Zhang, et. al.Zhihong Zhang ... Bihai Zhao
06 Apr 2020
Human Genomics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Online Ranking of Web Pages

Abstract

Talk to us

Similar Papers