A space and time efficient algorithm for SimRank computation

Weiren Yu,Qing Zhang,Xuemin Lin,Jiajin Le,Wenjie Zhang

doi:10.1007/s11280-010-0100-6

Abstract

SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields \(O\left(n^3\right)\) worst-case time per iteration with the space requirement \(O\left(n^2\right)\), where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes \(O \left(\min \left\{ n \cdot m , n^r \right\}\right)\) time and \(O \left( n + m \right)\) space, where m is the number of edges in a web-graph model and r ≤ log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.

Full Text