Efficient index-free SimRank similarity search in large graphs by discounting path lengths

Mingxi Zhang,Liuqian Yang,Hangfei Hu,Tianxing Liu,Jinhua Wang

doi:10.1016/j.eswa.2022.117746

Abstract

Link-based similarity search aims to find similar nodes for a given query node in a graph, which arises in numerous applications, including web spam detection, social network analysis and web search. Among existing methods, SimRank is a well-known similarity model, which provides an effective and trustful function for similarity search. A large amount of techniques on SimRank similarity search are devoted recently, which compute the similarity scores by traversing the paths between query and candidate nodes. However, the number of paths increases exponentially as path length increases, which makes the computation expensive and cannot support fast similarity search over large graphs. In this paper, we propose an efficient index-free SimRank similarity search approach, namely DisSim, which reduces the computational cost by discounting path length. We observe that SimRank could rapidly converge at a stable state and the results change little after a few of iterations. Based on the fast convergence, the similarity between nodes is defined as the SimRank score at the second iteration. For the computation of DisSim, we divide the similarity into one-step and two-step first-meeting probabilities. The one-step first-meeting probabilities are computed by path traverses from query to candidate nodes, which reduces computational cost by skipping unnecessary nodes. And the two-step first-meeting probabilities are computed by integrating the repeated parts of the paths. For further speeding up query processing, we develop a pruning algorithm, which prunes unpromising path traverses by setting a threshold, and the accuracy loss under threshold is given through mathematical analysis. Extensive experiments on real graphs demonstrate the performance of DisSim through comparing with the state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient index-free SimRank similarity search in large graphs by discounting path lengths

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Journal: Expert Systems With Applications	Publication Date: Jun 15, 2022
Citations: 1

Similar Papers

G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Xiaohong Wang ... Gerald H Lushington
Advances in database technology : proceedings. International Conference on Extending Database Technology | VOL. 360
Xiaohong Wang, et. al.Xiaohong Wang ... Gerald H Lushington
24 Mar 2009
Advances in database technology : proceedings. International Conference on Extending Database Technology | VOL. 360

Graph similarity search on large uncertain graph databases
Ye Yuan ... Lei Chen
The VLDB Journal | VOL. 24
Ye Yuan, et. al.Ye Yuan ... Lei Chen
09 Dec 2014
The VLDB Journal | VOL. 24

A Top-k Similarity Node Search Method Based on Convolutional Neural Network in Complex Network
Xiangfu Meng ... Zihan Li
-
Xiangfu Meng, et. al.Xiangfu Meng ... Zihan Li
01 Jan 2023
01 Jan 2023

Efficient processing of group-oriented connection queries in a large graph
James Cheng ... Yiping Ke
-
James Cheng, et. al.James Cheng ... Yiping Ke
02 Nov 2009
02 Nov 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient index-free SimRank similarity search in large graphs by discounting path lengths

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications