Struc2gauss: Structural role preserving network embedding via Gaussian embedding

Yulong Pei,Xin Du,George Fletcher,Mykola Pechenizkiy,Jianpeng Zhang

doi:10.1007/s10618-020-00684-x

Abstract

Network embedding (NE) is playing a principal role in network mining, due to its ability to map nodes into efficient low-dimensional embedding vectors. However, two major limitations exist in state-of-the-art NE methods: role preservation and uncertainty modeling. Almost all previous methods represent a node into a point in space and focus on local structural information, i.e., neighborhood information. However, neighborhood information does not capture global structural information and point vector representation fails in modeling the uncertainty of node representations. In this paper, we propose a new NE framework, struc2gauss, which learns node representations in the space of Gaussian distributions and performs network embedding based on global structural information. struc2gauss first employs a given node similarity metric to measure the global structural information, then generates structural context for nodes and finally learns node representations via Gaussian embedding. Different structural similarity measures of networks and energy functions of Gaussian embedding are investigated. Experiments conducted on real-world networks demonstrate that struc2gauss effectively captures global structural information while state-of-the-art network embedding methods fail to, outperforms other methods on the structure-based clustering and classification task and provides more information on uncertainties of node representations.

Highlights

Network analysis consists of numerous tasks including community detection (Fortunato 2010), role discovery (Rossi and Ahmed 2015), link prediction (Liben-Nowell and Kleinberg 2007), etc
Struc2gauss generates node context based on a global structural similarity measure to learn node representations so that global structural information can be taken into consideration
– For baselines, struc2vec, GraphWave and DRNE can capture the structural role information to some extent since their performance is better than these random walk based methods, i.e., DeepWalk and node2vec, and neighbor-based method, i.e., Embedding Propagation (EP) and graph2gauss, while all of them fail in capturing the global structural information for node clustering

Summary

Introduction

Network analysis consists of numerous tasks including community detection (Fortunato 2010), role discovery (Rossi and Ahmed 2015), link prediction (Liben-Nowell and Kleinberg 2007), etc. It has been reported that using embedded node representations can achieve promising performance on many network analysis tasks (Cao et al 2015; Grover and Leskovec 2016; Perozzi et al 2014; Ribeiro et al 2017). With the fast development of neural network techniques, unsupervised embedding algorithms have been widely used in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors in the learned embedding space, e.g., word2vec (Mikolov et al 2013a, b) and GloVe (Pennington et al 2014). By drawing an analogy between paths consists of several nodes on networks and word sequences in text, DeepWalk (Perozzi et al 2014) learns node representations based on random walks using the same mechanism of word2vec. Afterwards, a sequence of studies have been conducted to improve DeepWalk either by extending the definition of neighborhood to higher-order proximity (Cao et al 2015; Grover and Leskovec 2016; Perozzi et al 2016; Tang et al 2015b) or incorporating more information for node representations such as attributes (Li et al 2017; Wang et al 2017) and heterogeneity (Chang et al 2015; Tang et al 2015a)

Objectives

Methods

Results

Discussion

Conclusion