SimRank*: effective and scalable pairwise similarity search based on graph topology

Weiren Yu,Xuemin Lin,Julie A Mccann,Wenjie Zhang,Jian Pei

doi:10.1007/s00778-018-0536-3

Weiren Yu, Xuemin Lin + Show 3 more

Open Access

https://doi.org/10.1007/s00778-018-0536-3

Copy DOI

Abstract

Given a graph, how can we quantify similarity between two nodes in an effective and scalable way? SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes” can be regarded as an aggregation of similarities based on incoming paths. Despite its popularity in various applications (e.g., web search and social networks), SimRank has an undesirable trait, i.e., “zero-similarity”: it accommodates only the paths of equal length from a common “center” node, whereas a large portion of other paths are fully ignored. In this paper, we propose an effective and scalable similarity model, SimRank*, to remedy this problem. (1) We first provide a sufficient and necessary condition of the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, Li et al. ’s SimRank model, Random Walk with Restart (RWR), and ASCOS++. (2) We next present our treatment, SimRank*, which can resolve this issue while inheriting the merit of the simple SimRank philosophy. (3) We reduce the series form of SimRank* to a closed form, which looks simpler than SimRank but which enriches semantics without suffering from increased computational overhead. This leads to an iterative form of SimRank*, which requires O(Knm) time and O(n^2) memory for computing all (n^2) pairs of similarities on a graph of n nodes and m edges for K iterations. (4) To improve the computational time of SimRank* further, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient heuristic to speed up all-pairs SimRank* computation to O(Kn{tilde{m}}) time, where {tilde{m}} is generally much smaller than m. (5) To scale SimRank* on billion-edge graphs, we propose two memory-efficient single-source algorithms, i.e., ss-gSR* for geometric SimRank*, and ss-eSR* for exponential SimRank*, which can retrieve similarities between all n nodes and a given query on an as-needed basis. This significantly reduces the O(n^2) memory of all-pairs search to either O(Kn + {tilde{m}}) for geometric SimRank*, or O(n + {tilde{m}}) for exponential SimRank*, without any loss of accuracy, where {tilde{m}} ll n^2. (6) We also compare SimRank* with another remedy of SimRank that adds self-loops on each node and demonstrate that SimRank* is more effective. (7) Using real and synthetic datasets, we empirically verify the richer semantics of SimRank*, and validate its high computational efficiency and scalability on large graphs with billions of edges.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The VLDB Journal	Publication Date: Jan 11, 2019
Citations: 35	License type: open-access

R Discovery Prime

R Discovery Prime

SimRank*: effective and scalable pairwise similarity search based on graph topology

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Similar Papers

When Hashes Met Wedges
Aneesh Sharma ... C Seshadhri
-
Aneesh Sharma, et. al.Aneesh Sharma ... C Seshadhri
03 Apr 2017
03 Apr 2017

Efficient Processing Node Proximity via Random Walk with Restart
Bingqing Lv ... Julie A Mccann
-
Bingqing Lv, et. al.Bingqing Lv ... Julie A Mccann
01 Jan 2014
01 Jan 2014

Random Walk with Restart over Dynamic Graphs
Weiren Yu ... Julie Mccann
-
Weiren Yu, et. al.Weiren Yu ... Julie Mccann
01 Dec 2016
01 Dec 2016

BePI
Jinhong Jung ... U Kang
-
Jinhong Jung, et. al.Jinhong Jung ... U Kang
09 May 2017
09 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SimRank*: effective and scalable pairwise similarity search based on graph topology

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal