SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations

Sebastian Meznar,Blaz Skrlj,Nada Lavrac

doi:10.1109/access.2020.3039541

Abstract

Learning from complex real-life networks is a lively research area, with recent advances in learning information-rich, low-dimensional network node representations. However, state-of-the-art methods are not necessarily interpretable and are therefore not fully applicable to sensitive settings in biomedical or user profiling tasks, where explicit bias detection is highly relevant. The proposed SNoRe (Symbolic Node Representations) algorithm is capable of learning symbolic, human-understandable representations of individual network nodes, based on the similarity of neighborhood hashes which serve as features. SNoRe's interpretable features are suitable for direct explanation of individual predictions, which we demonstrate by coupling it with the widely used instance explanation tool SHAP to obtain nomograms representing the relevance of individual features for a given classification. To our knowledge, this is one of the first such attempts in a structural node embedding setting. In the experimental evaluation on eleven real-life datasets, SNoRe proved to be competitive to strong baselines, such as variational graph autoencoders, node2vec and LINE. The vectorized implementation of SNoRe scales to large networks, making it suitable for contemporary network learning and analysis tasks.

Highlights

N ETWORKS can be used to model numerous real-world systems, spanning from biological protein interaction networks to social and transportation networks [5], [18]
We begin with the classification results across the considered real-life datasets, followed by a series of ablation studies, where we explored SNoRe’s behaviour in more detail, ranging from its explainability capabilities to behaviour w.r.t. different hyperparameter settings
It should be noted that cosine similarity, Hub Promoted Index (HPI) and Jaccard similarities give us sparse embeddings, which perform significantly better when compared to the embeddings calculated using other metrics of the same size in bytes

Summary

Introduction

N ETWORKS can be used to model numerous real-world systems, spanning from biological protein interaction networks to social and transportation networks [5], [18]. By representing a real-life system as a network, it is possible to study network properties, such as the key network nodes, why they are relevant, how sets of nodes group together and how network nodes are classified [2], [4]. Label propagation and similar approaches operate in a relatively naïve manner, not accounting for the rich structure of a given network that spans beyond simple neighborhoods. To mitigate this issue, novel representation learning methods emerged, offering efficient ways of constructing real-valued vector representations of individual nodes, suitable for down-stream learning such as classification

Results

Discussion

Conclusion