Personalized PageRank (PPR) computation is a fundamental problem in graph analysis. The state-of-the-art algorithms for PPR computation are based on a bidirectional framework which include a deterministic forward push and a Monte Carlo sampling procedure. The Monte Carlo sampling procedure, however, often has a relatively-large variance, thus reducing the performance of the PPR computation algorithms. To overcome this issue, we develop two novel variance-reduced Monte Carlo techniques for PPR computation. Our first technique is to apply power iterations to reduce the variance of the Monte Carlo sampling procedure. We prove that conducting few power iterations can significantly reduce the variance of existing Monte Carlo estimators, only with few additional costs. Moreover, we show that such a simple and novel variance-reduced Monte Carlo technique can achieve comparable estimation accuracy and the same time complexity as the state-of-the-art bidirectional algorithms. Our second technique is a novel progressive sampling method which uses the historical information of former samples to reduce the variance of the Monte Carlo estimator. We develop several novel PPR computation algorithms by integrating both of these variance reduction techniques with two existing Monte Carlo sampling approaches, including random walk sampling and spanning forests sampling. Finally, we conduct extensive experiments on 5 real-life large graphs to evaluate our solutions. The results show that our algorithms can achieve much higher PPR estimation accuracy by using much less time, compared to the state-of-the-art bidirectional algorithms.
Read full abstract