Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study

Stephen Bonner,Boguslaw Obara,Georgios Theodoropoulos,John Brennan,Ibad Kureshi,Andrew Stephen Mcgough

doi:10.1007/s41019-019-0097-5

Stephen Bonner, Boguslaw Obara + Show 4 more

Open Access

https://doi.org/10.1007/s41019-019-0097-5

Copy DOI

Abstract

Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Unsupervised graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings, which could be a possible way to bring interpretability to the process. In this paper, we investigate if graph embeddings are approximating something analogous to traditional vertex-level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation with five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated in the embedding space, allowing key insight into how graph embeddings create good representations.

Highlights

Representing the complex and inherent links and relationships between and within datasets in the form of a graph is a widely performed practice across many scientific disciplines [1]
We focus solely upon unsupervised graph embedding techniques as we want to explore what features the techniques learn from the topology alone, without the requirement for labels
– We investigate if unsupervised graph embeddings are learning something analogous with traditional vertexlevel graph features

Summary

Introduction

Representing the complex and inherent links and relationships between and within datasets in the form of a graph is a widely performed practice across many scientific disciplines [1]. Analysing and making predictions about graph using machine learning has shown significant advances in a range of commonly performed tasks over traditional approaches [2]. Graph embedding models are a collection of machine learning techniques which attempt to learn key features from a graph’s topology automatically, in either a supervised or unsupervised manner, removing the often cumbersome task of end-users manually selecting representative graph features [4]. This process, known as feature selection [13] in the machine learning literature, has clear disadvantages as certain features may only be useful for a certain task. The work presented in this paper focuses on neural network-based approaches for graph embedding (as these have demonstrated superior performance compared with traditional approaches [2])

Methods

Results

Discussion

Conclusion