A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations

Chris P Lee,Gene H Golub,Stefanos A Zenios

doi:10.1080/15427951.2007.10129151

Abstract

The PageRank model pioneered by Google is the most common approach for generating web search results. We present a two-stage algorithm for computing the PageRank vector where the algorithm exploits the lumpability of the underlying Markov chain. We make three contributions. First, the algorithm speeds up the PageRank calculation significantly. With web graphs having millions of webpages, the speedup is typically in the two- to three-fold range. The algorithm can also embed other acceleration methods such as quadratic extrapolation, the Gauss-Seidel method, or the Biconjugate gradient stable method for an even greater speed-up; cumulative speedup is as high as 7 to 14 times. The second contribution relates to the handling of dangling nodes. Conventionally, dangling nodes are included only towards the end of the computation. While this approach works reasonably well, it can fail in extreme cases involving aggressive personalization. We prove that our algorithm is the generally correct way of handling dangling nodes using probabilistic arguments. We also discuss variants of our algorithm, including a multistage extension for calculating a generalized version of the PageRank model where different personalization vectors are used for webpages of different classes. The ability to form class associations may be useful for building more refined models of web traffic.

Full Text