Evaluating node embeddings of complex networks

Arash Dehghan-Kooshkghazi,François Théberge,Bogumił Kamiński,Paweł Prałat,Łukasz Kraiński,Ali Pinar

doi:10.1093/comnet/cnac030

Arash Dehghan-Kooshkghazi, François Théberge + Show 4 more

Open Access

https://doi.org/10.1093/comnet/cnac030

Copy DOI

Abstract

Abstract Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship and other relevant information about the graph, its subgraphs and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection or link prediction. In this article, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments, we formulate the following general conclusions. First, we confirm the main problem of node embeddings that is rather well-known to practitioners but less documented in the literature. There exist many algorithms available to choose from which use different techniques and have various parameters that may be tuned, the dimension being one of them. One needs to ensure that embeddings describe the properties of the underlying graphs well but, as our experiments confirm, it highly depends on properties of the network at hand and the given application in mind. As a result, selecting the best embedding is a challenging task and very often requires domain experts. Since investigating embeddings in a supervised manner is computationally expensive, there is a need for an unsupervised tool that is able to select a handful of promising embeddings for future (supervised) investigation. A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the ‘divergence score’ to embeddings with a goal of distinguishing good from bad ones. We show that the divergence score strongly correlates with the quality of embeddings by investigating three main applications of node embeddings: node classification, community detection and link prediction.

Full Text