Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., Role Sim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely Struct Sim, to compute nodes’ role similarity. Under this framework, we first prove that Struct Sim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the Bin Count matching that not only is efficient to compute but also guarantees the admissibility of Struct Sim. Bin Count-based Struct Sim admits a precomputed index to query a single pair of node in $$O(k\log D)$$ time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that Struct Sim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.
Read full abstract