Abstract
PageRank is an algorithm concerning search queries over the Internet. The algorithm returns the best search results to the user based on the webpage relevancy by calculating the outgoing links from each webpage. Although useful, the algorithm consumes a considerable amount of time as it needs to calculate the available webpages, which are also increasing in number over time. Moreover, the returned results by the algorithm are biased towards old webpages because they have the volume due to their lifetime, thus resulting in newly created webpages to have lower page ranks even though they have comparatively more relevant and useful information. To overcome these issues, this paper proposes an alternative hybrid PageRank algorithm based on optimized normalization technique and content-based approach. The proposed algorithm reduces the number of iterations required to calculate the page rank, hence improves the efficiency, by calculating the mean of all page rank values and normalizes them through the use of the mean. Through this approach, the algorithm is also able to determine the relevancy of webpages based on validity of links rather than popularity. These claims are demonstrated by an experiment conducted on the proposed algorithm using a dummy web structure consisting of 12 webpages. The results showed that the traditional PageRank algorithm has 74% more iterations than the proposed algorithm. The proposed algorithm returned a mean value of 1.00 compared to 1.32 for the traditional algorithm. These results confirm that the proposed algorithm saves a substantial amount of computing power while being more precise and not biased.
Highlights
PageRank algorithm is employed in search engines for retrieving the best results for a query by using the structure of the internet graph to rank the importance of a webpage
The algorithm has some drawbacks, for instance, it favours towards old webpages as they have many outbound links compared to the new webpages
The algorithm conducts a large amount of iterations to calculate the page rank of a web page, causes time inefficiency (3)
Summary
PageRank algorithm is employed in search engines for retrieving the best results for a query by using the structure of the internet graph to rank the importance of a webpage. It generates a list order from most priority to least priority webpages across diverse networks (1). The proposed hybrid algorithm overcomes the inefficiency issue through the former whereas the validity concern through the latter approach. The following section provides the related work concerning PageRank algorithm. It is followed by the methodology used in the study and the results obtained.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.