Learning compact graph representations via an encoder-decoder network

John Boaz Lee,Xiangnan Kong

doi:10.1007/s41109-019-0157-9

John Boaz Lee, Xiangnan Kong

Open Access

https://doi.org/10.1007/s41109-019-0157-9

Copy DOI

Journal: Applied network science	Publication Date: Jul 18, 2019
Citations: 2	License type: open-access

Affiliation: Worcester Polytechnic Institute

Abstract

Feature representation learning for classification of multiple graphs is a problem with practical applications in many domains. For instance, in chemoinformatics, the learned feature representations of molecular graphs can be used to classify molecules which exhibit anti-cancer properties. In many previous work, including discriminative subgraph mining and graphlet-based approaches, a graph representation is derived by counting the occurrence of various graph sub-structures. However, these representations fail to capture the co-occurrence patterns that are inherently present among different sub-structures. Recently, various methods (e.g., DeepWalk, node2vec) have been proposed to learn representations for nodes in a graph. These methods use node co-occurrence to learn node embeddings. However, these methods fail to capture the co-occurrence relationship between more complex sub-structures in the graph since they were designed primarily for node representation learning. In this work, we study the problem of learning graph representations that can capture the structural and functional similarity of sub-structures (as evidenced by their co-occurrence relationship) in graphs. This is particularly useful when classifying graphs that have very few sub-structures in common. The proposed method uses an encoder-decoder model to predict the random walk sequence along neighboring regions (or sub-structures) in a graph given a random walk along a particular region. The method is unsupervised and can be used to obtain generic feature representations of graphs making it applicable with various types of graphs. We evaluate the learned representations using several real-world datasets on the binary graph classification task. The proposed model is able to achieve superior results against multiple state-of-the-art techniques.

Highlights

Graph-structured data can be found in many different domains including biology, chemistry, and the study of social networks (Duvenaud et al 2015; Hwang and Kuang 2010; Yanardag and Vishwanathan 2015)
We argue that by training an encoder-decoder model on a large number of random walk sequences, we can learn a feature representation that groups structurally and functionally similar subgraphs together
If an encoder-decoder model is trained with a large number of random walks, the sub-sequence corresponding to sub-structures in the graph that co-appear frequently will have learned embeddings that are more similar

Summary

Introduction

Graph-structured data can be found in many different domains including biology, chemistry, and the study of social networks (Duvenaud et al 2015; Hwang and Kuang 2010; Yanardag and Vishwanathan 2015). We argue that by training an encoder-decoder model on a large number of random walk sequences, we can learn a feature representation that groups structurally and functionally similar subgraphs together. If an encoder-decoder model is trained with a large number of random walks, the sub-sequence corresponding to sub-structures in the graph that co-appear frequently will have learned embeddings that are more similar. This allows us to learn representations for sub-structures that are more compact since the different sub-structures are not considered independently of one another. Obtaining final graph representation After the encoder-decoder has been trained, we can freeze the model and use the encoder to generate representations, hi, for any arbitrary random walk sequence. While one can certainly try more sophisticated approaches for aggregation like a neural network that learns a final graph representation from the sampled representations, we choose to use relatively simple aggregation techniques to highlight the usefulness of the model

Average

Cluster

Experiments

Method

Findings

Conclusion