Abstract

Feature representation learning for classification of multiple graphs is a problem with practical applications in many domains. For instance, in chemoinformatics, the learned feature representations of molecular graphs can be used to classify molecules which exhibit anti-cancer properties. In many previous work, including discriminative subgraph mining and graphlet-based approaches, a graph representation is derived by counting the occurrence of various graph sub-structures. However, these representations fail to capture the co-occurrence patterns that are inherently present among different sub-structures. Recently, various methods (e.g., DeepWalk, node2vec) have been proposed to learn representations for nodes in a graph. These methods use node co-occurrence to learn node embeddings. However, these methods fail to capture the co-occurrence relationship between more complex sub-structures in the graph since they were designed primarily for node representation learning. In this work, we study the problem of learning graph representations that can capture the structural and functional similarity of sub-structures (as evidenced by their co-occurrence relationship) in graphs. This is particularly useful when classifying graphs that have very few sub-structures in common. The proposed method uses an encoder-decoder model to predict the random walk sequence along neighboring regions (or sub-structures) in a graph given a random walk along a particular region. The method is unsupervised and can be used to obtain generic feature representations of graphs making it applicable with various types of graphs. We evaluate the learned representations using several real-world datasets on the binary graph classification task. The proposed model is able to achieve superior results against multiple state-of-the-art techniques.

Highlights

  • Graph-structured data can be found in many different domains including biology, chemistry, and the study of social networks (Duvenaud et al 2015; Hwang and Kuang 2010; Yanardag and Vishwanathan 2015)

  • We argue that by training an encoder-decoder model on a large number of random walk sequences, we can learn a feature representation that groups structurally and functionally similar subgraphs together

  • If an encoder-decoder model is trained with a large number of random walks, the sub-sequence corresponding to sub-structures in the graph that co-appear frequently will have learned embeddings that are more similar

Read more

Summary

Introduction

Graph-structured data can be found in many different domains including biology, chemistry, and the study of social networks (Duvenaud et al 2015; Hwang and Kuang 2010; Yanardag and Vishwanathan 2015). We argue that by training an encoder-decoder model on a large number of random walk sequences, we can learn a feature representation that groups structurally and functionally similar subgraphs together. If an encoder-decoder model is trained with a large number of random walks, the sub-sequence corresponding to sub-structures in the graph that co-appear frequently will have learned embeddings that are more similar. This allows us to learn representations for sub-structures that are more compact since the different sub-structures are not considered independently of one another. Obtaining final graph representation After the encoder-decoder has been trained, we can freeze the model and use the encoder to generate representations, hi, for any arbitrary random walk sequence. While one can certainly try more sophisticated approaches for aggregation like a neural network that learns a final graph representation from the sampled representations, we choose to use relatively simple aggregation techniques to highlight the usefulness of the model

Average
Cluster
Experiments
Method
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call