Abstract

SimRank is an effective structural similarity measurement between two vertices in a graph, which can be used in many applications like recommender systems. Although progresses have been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, the existing methods require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, Uni-Walk, to enable the fast top-k SimRank computation over large undirected graphs without indexing. UniWalk directly locates the top-k similar vertices for any single source vertex u via O(R) sampling paths originating from u only, which avoids the selection of candidate vertex set C and the following O(|C|R) bidirectional sampling paths starting from u and each candidate respectively in existing methods. We also design a space-efficient method to reduce intermediate results, and a path-sharing strategy to optimize path sampling for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude, and such an improvement is achieved without any indexing overheads.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.