On efficient randomized algorithms for finding the PageRank vector

A V Gasnikov,D Yu Dmitriev

doi:10.1134/s0965542515030069

Abstract

Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system pT = pTP with a stochastic n × n matrix P, where n ∼ 107–109, is sought (in the class of probability distributions) with accuracy ɛ: ɛ ≫ n−1. Thus, the possibility of brute-force multiplication of P by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process pt+1T = ptTP quickly reaches a steady state. Additionally, it takes into account another specific feature of P, namely, the nonzero off-diagonal elements of P are equal in rows (this property is used to organize a random walk over the graph with the matrix P). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of P. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game $$\mathop {\min }\limits_{p \in S_n (1)} \mathop {\max }\limits_{u \in S_n (1)} \left\langle {u,\left( {P^T - I} \right)p} \right\rangle ,$$ where Sn(1) is a unit simplex in ℝn and I is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis-Khachiyan algorithm (1995). This technique, like the Nazin-Polyak method (2009), is a randomized version of Nemirovski’s mirror descent method. The difference is that randomization in the Grigoriadis-Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices P, the method proposed yields noticeably better results.

Full Text