Unsupervised network embeddings with node identity awareness

Leonardo Gutiérrez-Gómez,Jean-Charles Delvenne

doi:10.1007/s41109-019-0197-1

Leonardo Gutiérrez-Gómez, Jean-Charles Delvenne

Open Access

https://doi.org/10.1007/s41109-019-0197-1

Copy DOI

Journal: Applied network science	Publication Date: Oct 17, 2019
Citations: 17	License type: open-access

Affiliation: Université Catholique de Louvain

Abstract

A main challenge in mining network-based data is finding effective ways to represent or encode graph structures so that it can be efficiently exploited by machine learning algorithms. Several methods have focused in network representation at node/edge or substructure level. However, many real life challenges related with time-varying, multilayer, chemical compounds and brain networks involve analysis of a family of graphs instead of single one opening additional challenges in graph comparison and representation. Traditional approaches for learning representations relies on hand-crafted specialized features to extract meaningful information about the graphs, e.g. statistical properties, structural motifs, etc. as well as popular graph distances to quantify dissimilarity between networks. In this work we provide an unsupervised approach to learn graph embeddings for a collection of graphs defined on the same set of nodes so that it can be used in numerous graph mining tasks. By using an unsupervised neural network approach on input graphs, we aim to capture the underlying distribution of the data in order to discriminate between different class of networks. Our method is assessed empirically on synthetic and real life datasets and evaluated in three different tasks: graph clustering, visualization and classification. Results reveal that our method outperforms well known graph distances and graph-kernels in clustering and classification tasks, being highly efficient in runtime.

Highlights

Numerous complex systems in social, medical, biological and engineering sciences can be studied under the framework of networks
Graph2vec makes a good job grouping the data in three non-overlapping clusters whereas subgraph2vec creates concentric cloud of points collapsing different graphs to the center. on the other hand Graph Attention Model (GAM) captures a different notion of similarity, where graphs with locally similar nodes degree sequences are grouped together. our embedding (Emb) shows three well defined clouds of points grouping together isomorphic graphs. our method exploits node correspondence across graphs when it is known, but even if the node order is not of particular significance we can retrieve networks that are essentially identical
We can make the following observations: CLaplacian and our graph embedding method (Emb) are capable to discover the differences between data examples, so that they separate the data almost perfect clusters

Summary

Introduction

Numerous complex systems in social, medical, biological and engineering sciences can be studied under the framework of networks. Network models are often analyzed at the node/edge or substructure level, studying the interaction among entities, identifying groups of nodes behaving or finding global and local connectivity patterns among a given network. Many real life challenges might involve collections of networks representing instances of the system under study, e.g functional brain networks (connectomes) (Hagmann et al 2008), chemical compound graphs (Srinivasan et al 1997), multilayer networks (Cardillo et al 2013), and so on. Other applications involve dynamic interactions between components, introducing an additional complexity in the time evolution of the system. In a social mobile phone network, people are considered as nodes and the phone calls as edges. The dynamics of calls between users will systematically add and remove edges between

Objectives

Methods

Findings

Conclusion