Abstract

In this paper we consider the stochastic analysis of information ranking algorithms of large interconnected data sets, e.g. Google's PageRank algorithm for ranking pages on the World Wide Web. The stochastic formulation of the problem results in an equation of the formwhereN,Q, {Ri}i≥1, and {C,Ci}i≥1are independent nonnegative random variables, the {C,Ci}i≥1are identically distributed, and the {Ri}i≥1are independent copies ofstands for equality in distribution. We study the asymptotic properties of the distribution ofRthat, in the context of PageRank, represents the frequencies of highly ranked pages. The preceding equation is interesting in its own right since it belongs to a more general class of weighted branching processes that have been found to be useful in the analysis of many other algorithms. Our first main result shows that if ENE[Cα] = 1, α > 0, andQ,Nsatisfy additional moment conditions, thenRhas a power law distribution of index α. This result is obtained using a new approach based on an extension of Goldie's (1991) implicit renewal theorem. Furthermore, whenNis regularly varying of index α > 1, ENE[Cα] < 1, andQ,Chave higher moments than α, then the distributions ofRandNare tail equivalent. The latter result is derived via a novel sample path large deviation method for recursive random sums. Similarly, we characterize the situation when the distribution ofRis determined by the tail ofQ. The preceding approaches may be of independent interest, as they can be used for analyzing other functionals on trees. We also briefly discuss the engineering implications of our results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call