Abstract

Link-based similarity search aims to find similar nodes for a given query node in a graph, which arises in numerous applications, including web spam detection, social network analysis and web search. Among existing methods, SimRank is a well-known similarity model, which provides an effective and trustful function for similarity search. A large amount of techniques on SimRank similarity search are devoted recently, which compute the similarity scores by traversing the paths between query and candidate nodes. However, the number of paths increases exponentially as path length increases, which makes the computation expensive and cannot support fast similarity search over large graphs. In this paper, we propose an efficient index-free SimRank similarity search approach, namely DisSim, which reduces the computational cost by discounting path length. We observe that SimRank could rapidly converge at a stable state and the results change little after a few of iterations. Based on the fast convergence, the similarity between nodes is defined as the SimRank score at the second iteration. For the computation of DisSim, we divide the similarity into one-step and two-step first-meeting probabilities. The one-step first-meeting probabilities are computed by path traverses from query to candidate nodes, which reduces computational cost by skipping unnecessary nodes. And the two-step first-meeting probabilities are computed by integrating the repeated parts of the paths. For further speeding up query processing, we develop a pruning algorithm, which prunes unpromising path traverses by setting a threshold, and the accuracy loss under threshold is given through mathematical analysis. Extensive experiments on real graphs demonstrate the performance of DisSim through comparing with the state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.