Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains

Weiren Yu,Wenjie Zhang,Julie A. McCann,Xuemin Lin

doi:10.1109/tkde.2014.2339828

Weiren Yu, Wenjie Zhang + Show 2 more

Open Access

https://doi.org/10.1109/tkde.2014.2339828

Copy DOI

Abstract

SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in $O(Kmn)$ time on a graph of $n$ vertices and $m$ edges, for $K$ iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy $\epsilon$ , the existing SimRank needs $K=\lceil \log _C \,\epsilon \rceil$ iterations, where $C$ is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to $O(K d^{\prime } n^2)$ time, where $d^{\prime }$ is typically much smaller than $\frac{m}{n}$ . (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from $O(Kmn)$ to $O(Km^{\prime }n)$ time, where $m^{\prime } ({\le} m)$ is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than $m$ . Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jul 1, 2015
Citations: 32	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Similar Papers

Towards efficient SimRank computation on large networks
Weiren Yu ... Xuemin Lin
-
Weiren Yu, et. al. Weiren Yu ... Xuemin Lin
01 Apr 2013
01 Apr 2013

Top-k subgraph matching query in a large graph
Lei Zou ... Lei Chen
-
Lei Zou, et. al.Lei Zou ... Lei Chen
09 Nov 2007
09 Nov 2007

Developing an efficient spectral clustering algorithm on large scale graphs in spark
Ahmed I Taloba ... Marwan R Riad
-
Ahmed I Taloba, et. al.Ahmed I Taloba ... Marwan R Riad
01 Dec 2017
01 Dec 2017

Efficiently computing k-edge connected components via graph decomposition
Lijun Chang ... Weifa Liang
-
Lijun Chang, et. al.Lijun Chang ... Weifa Liang
22 Jun 2013
22 Jun 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering