Efficient Pairwise Penetrating-rank Similarity Retrieval

Weiren Yu,Julie Mccann,Chengyuan Zhang

doi:10.1145/3368616

Abstract

Many web applications demand a measure of similarity between two entities, such as collaborative filtering, web document ranking, linkage prediction, and anomaly detection. P-Rank (Penetrating-Rank) has been accepted as a promising graph-based similarity measure, as it provides a comprehensive way of encoding both incoming and outgoing links into assessment. However, the existing method to compute P-Rank is iterative in nature and rather cost-inhibitive. Moreover, the accuracy estimate and stability issues for P-Rank computation have not been addressed. In this article, we consider the optimization techniques for P-Rank search that encompasses its accuracy, stability, and computational efficiency. (1) The accuracy estimation is provided for P-Rank iterations, with the aim to find out the number of iterations, k , required to guarantee a desired accuracy. (2) A rigorous bound on the condition number of P-Rank is obtained for stability analysis. Based on this bound, it can be shown that P-Rank is stable and well-conditioned when the damping factors are chosen to be suitably small. (3) Two matrix-based algorithms, applicable to digraphs and undirected graphs, are, respectively, devised for efficient P-Rank computation, which improves the computational time from O ( kn 3 ) to O (υ n 2 +υ 6 ) for digraphs, and to O (υ n 2 ) for undirected graphs, where n is the number of vertices in the graph, and υ (≪ n ) is the target rank of the graph. Moreover, our proposed algorithms can significantly reduce the memory space of P-Rank computations from O ( n 2 ) to O (υ n +υ 4 ) for digraphs, and to O (υ n ) for undirected graphs, respectively. Finally, extensive experiments on real-world and synthetic datasets demonstrate the usefulness and efficiency of the proposed techniques for P-Rank similarity assessment on various networks.

Full Text