Learning Latent Features using Stochastic Neural Networks on Graph Structured Data

Tobias Weller

doi:10.5445/ir/1000130825

Abstract

Graph structured data are ubiquitous data structures, used to model relationships between entities. Graphs have become an important foundation to represent interactions between users in social networks, items in recommender systems, and interactions between drugs in bioinformatics. The main research problems in these areas include node clustering, node classification and link prediction. Especially the link prediction task is in bioinformatics of special interest toward the identification and development of new uses of existing or abandoned drugs since drug development is currently very time consuming and expensive. In the context of knowledge graphs, link prediction is also of special interest to automatically complete missing information to derive further knowledge. Likewise, node classification is an important research focus in the context of knowledge graphs, e.g. to automatically classify new entities according to their class affiliation and to complete missing class affiliation for existing entities. In recent years, network embeddings are often trained for encoding the entities of graph structured data into a low-dimensional space whilst preserving the graph structure. Based on the trained embeddings, machine learning techniques are applied to address the main machine learning tasks, such as link prediction and node classification. In most of the published methods, like e.g. RDF2Vec, DeepWalk, node2vec and LINE, random walks procedures are used to efficiently explore diverse neighbourhoods and compute embeddings based on them. However, these methods develop their full potential only when the input graph is connected, otherwise the random walks are not sufficient to gather enough information about nodes in the neighborhoods, as not all nodes in the graph can be reached. In this work we address three types of problems: Link prediction on bipartite networks, link prediction on knowledge graphs and a semantic grouping of nodes and links in graphs. We use a stochastic factorization model to learn a target distribution over the graph structured data, allowing to predict unknown links and embed the nodes into a low-dimensional space whilst preserving the distribution of interactions within the graph. The embeddings are used in a following step to learn a function for predicting instance types and domain assertions using training data. Compared to the existing methods that use random walks, our approach is much more robust with respect to the connectivity of the graph structured data. Results show that the proposed method outperforms current state-of-the-art models in several studied graph structured data and sets a new baseline in link prediction on disconnected graph structured data and grouping of nodes and links.

Full Text