Massively parallel algorithms for personalized pagerank

Guanhao Hou,Zhewei Wei,Xingguang Chen,Sibo Wang

doi:10.14778/3461535.3461554

Abstract

Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round. Motivated by this, we present Delta-Push , an efficient framework for single-source and top- k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push al6gorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to [EQUATION] under a load of O ( m/p ), where m is the number of edges of the input graph, p is the number of executors, and ϵ is a user-defined error parameter. In the meantime, as the number of executors increases to p' = γ · p , the load constraint can be relaxed since each executor can hold O (γ · m/p' ) messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O (γ · m/p' ), our Delta-Push can process γ queries in a batch with [EQUATION] rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top- k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Massively parallel algorithms for personalized pagerank

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: May 1, 2021
Citations: 23

Similar Papers

TopPPR
Zhewei Wei ... Sibo Wang
-
Zhewei Wei, et. al.Zhewei Wei ... Sibo Wang
27 May 2018
27 May 2018

On the Role of Clustering in Personalized PageRank Estimation
Daniel Vial ... Vijay Subramanian
ACM Transactions on Modeling and Performance Evaluation of Computing Systems | VOL. 4
Daniel Vial, et. al.Daniel Vial ... Vijay Subramanian
06 Dec 2019
ACM Transactions on Modeling and Performance Evaluation of Computing Systems | VOL. 4

Realtime top-k personalized pagerank over large graphs on GPUs
Jieming Shi ... Tianyuan Jin
Proceedings of the VLDB Endowment | VOL. 13
Jieming Shi, et. al.Jieming Shi ... Tianyuan Jin
01 Sep 2019
Proceedings of the VLDB Endowment | VOL. 13

HubPPR
Sibo Wang ... Xiaokui Xiao
Proceedings of the VLDB Endowment | VOL. 10
Sibo Wang, et. al.Sibo Wang ... Xiaokui Xiao
01 Nov 2016
Proceedings of the VLDB Endowment | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Massively parallel algorithms for personalized pagerank

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment