The Hidden Flow Structure and Metric Space of Network Embedding Algorithms Based on Random Walks

Weiwei Gu,Li Gong,Xiaodan Lou,Jiang Zhang

doi:10.1038/s41598-017-12586-y

Abstract

Network embedding which encodes all vertices in a network as a set of numerical vectors in accordance with it’s local and global structures, has drawn widespread attention. Network embedding not only learns significant features of a network, such as the clustering and linking prediction but also learns the latent vector representation of the nodes which provides theoretical support for a variety of applications, such as visualization, link prediction, node classification, and recommendation. As the latest progress of the research, several algorithms based on random walks have been devised. Although those algorithms have drawn much attention for their high scores in learning efficiency and accuracy, there is still a lack of theoretical explanation, and the transparency of those algorithms has been doubted. Here, we propose an approach based on the open-flow network model to reveal the underlying flow structure and its hidden metric space of different random walk strategies on networks. We show that the essence of embedding based on random walks is the latent metric structure defined on the open-flow network. This not only deepens our understanding of random- walk-based embedding algorithms but also helps in finding new potential applications in network embedding.

Highlights

There has been a surge of works proposing alternative ways to embed networks by training neural networks[15,16,27] in various approaches inspired by natural language processing techniques[28,29,30]
We propose a new perspective based on a metric defined on the flow structures to understand the embedding space behind the random-walks-based algorithms, and we put forward a novel network embedding algorithm which combines the manifold learning with the new metric
We notice that the open flow network model can be used to reflect the flow structure behind different random walk strategies; Figure 3

Summary

Introduction

There has been a surge of works proposing alternative ways to embed networks by training neural networks[15,16,27] in various approaches inspired by natural language processing techniques[28,29,30]. After the sequences have been generated, skip-gram in word2vec[30], which is one of the most famous algorithms for word embedding developed in the deep learning community, can be efficiently applied on the sequences Among these random-walk-based approaches, deepwalk[15] and node2vec[16] have drawn wide attention for their high training speed and high classification accuracy. We propose a new network embedding method named Flow-based Geometric Embedding(FGE) with its numeric approximate improvement algorithm which has less free parameters and faster implementation than the known algorithms based on random walks. This embedding method can achieve similar clustering results with node2vec and reasonable ranking outcome compared with other ranking algorithms

Methods

Results

Conclusion