Node classifications with DjCaNE: Disjoint content and network embedding

Mohsen Fazaeli,Saeedeh Momtazi

doi:10.1177/01655515221111002

Abstract

Machine learning approaches have become a crucial tool in graph analysis. Despite the accurate results of the existing approaches, most of them are not scalable enough to be used in real-world problems. Networks provide two different kinds of information, nodes contents and nodes relations (network structure). Training deep graph neural networks (GNN) over large-scale graphs is challenging due to the limitation of the message passing framework. Graph Convolutional Networks (GCN) work on all node neighbours at once. Furthermore, it is usual to transform node features with a deep neural network before the GC operation. Therefore, the deep transform operation may apply up to hundreds of times for each target node which is heavy computation and hard to batch. This paper presents an abstract framework with two embedding components, the first component embeds node relations, and the second one embeds node contents. The model makes predictions by aggregating these embeddings through a combination component. The presented approach limits the deep transform only to the target node and uses random walk-based embedding instead of the GC operator to reduce the cost. The main goal of the proposed approach is to provide a light framework for the task. To this aim, node relations are embedded based on node neighbourhood structure by a biased variant of the DeepWalk model, called GuidedWalk, and an autoencoder embeds node contents. The experimental results on three well-known datasets show the superiority of the proposed model compared to the state-of-the-art GraphSAGE and TADW models with less computational complexity. On the Citeseer, Cora, and PubMed datasets, the model has achieved 3.23%, 0.88%, and 7.63% improvement in Macro-F1 and 3.25%, 0.7%, and 6.34% improvement in Micro-F1, respectively. Although GNNs are state-of-the-art models, considering node content is their main advantage. This paper shows that even a simple integration of node content to available random walk-based methods improves their performance up to GCNs without increasing the complexity.

Full Text