Abstract

Random walk-based node embedding algorithms learn vector representations of nodes by optimizing an objective function of embedding vectors and skip-bigram statistics computed from random walks on the network. They have demonstrated state-of-the-art performance in many supervised learning problems such as link prediction and node classification. Yet, their properties remain poorly understood. This paper studies properties of random walk based node embeddings in the unsupervised setting of discovering latent network block structure, i.e., learning node representations whose cluster structure in Euclidean space reflects node adjacencies within the network. We characterize the ergodic limits of the embedding objective, its generalization, and related convex relaxations to derive corresponding non-randomized versions of the node embedding objectives. We also characterize the optimal node embedding Grammians of the non-randomized objectives for the expected graph of a two-community Stochastic Block Model (SBM). We prove that the solution Grammian has rank 1 for a suitable nuclear norm relaxation of the non-randomized objective. Comprehensive experimental results on SBM random networks reveal that our non-randomized ergodic objectives yield node embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call