Abstract

PageRank is an algorithm concerning search queries over the Internet. The algorithm returns the best search results to the user based on the webpage relevancy by calculating the outgoing links from each webpage. Although useful, the algorithm consumes a considerable amount of time as it needs to calculate the available webpages, which are also increasing in number over time. Moreover, the returned results by the algorithm are biased towards old webpages because they have the volume due to their lifetime, thus resulting in newly created webpages to have lower page ranks even though they have comparatively more relevant and useful information. To overcome these issues, this paper proposes an alternative hybrid PageRank algorithm based on optimized normalization technique and content-based approach. The proposed algorithm reduces the number of iterations required to calculate the page rank, hence improves the efficiency, by calculating the mean of all page rank values and normalizes them through the use of the mean. Through this approach, the algorithm is also able to determine the relevancy of webpages based on validity of links rather than popularity. These claims are demonstrated by an experiment conducted on the proposed algorithm using a dummy web structure consisting of 12 webpages. The results showed that the traditional PageRank algorithm has 74% more iterations than the proposed algorithm. The proposed algorithm returned a mean value of 1.00 compared to 1.32 for the traditional algorithm. These results confirm that the proposed algorithm saves a substantial amount of computing power while being more precise and not biased.

Highlights

  • PageRank algorithm is employed in search engines for retrieving the best results for a query by using the structure of the internet graph to rank the importance of a webpage

  • The algorithm has some drawbacks, for instance, it favours towards old webpages as they have many outbound links compared to the new webpages

  • The algorithm conducts a large amount of iterations to calculate the page rank of a web page, causes time inefficiency (3)

Read more

Summary

INTRODUCTION

PageRank algorithm is employed in search engines for retrieving the best results for a query by using the structure of the internet graph to rank the importance of a webpage. It generates a list order from most priority to least priority webpages across diverse networks (1). The proposed hybrid algorithm overcomes the inefficiency issue through the former whereas the validity concern through the latter approach. The following section provides the related work concerning PageRank algorithm. It is followed by the methodology used in the study and the results obtained.

LITERATURE REVIEW
Equation 3
METHODOLOGY
RESULTS AND FINDINGS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call