Network Representation Learning Guided by Partial Community Structure

Hanlin Sun,Sugang Ma,Zhongmin Wang,Wei Jie,Hai Wang

doi:10.1109/access.2020.2978517

Abstract

Network Representation Learning (NRL) is an effective way to analyse large scale networks (graphs). In general, it maps network nodes, edges, subgraphs, etc. onto independent vectors in a low dimension space, thus facilitating network analysis tasks. As community structure is one of the most prominent mesoscopic structure properties of real networks, it is necessary to preserve community structure of networks during NRL. In this paper, the concept of k-step partial community structure is defined and two Partial Community structure Guided Network Embedding (PCGNE) methods, based on two popular NRL algorithms (DeepWalk and node2vec respectively), for node representation learning are proposed. The idea behind this is that it is easier and more cost-effective to find a higher quality 1-step partial community structure than a higher quality whole community structure for networks; the extracted partial community information is then used to guide random walks in DeepWalk or node2vec. As a result, the learned node representations could preserve community structure property of networks more effectively. The two proposed algorithms and six state-of-the-art NRL algorithms were examined through multi-label classification and (inner community) link prediction on eight synthesized networks: one where community structure property could be controlled, and one real world network. The results suggested that the two PCGNE methods could improve the performance of their own based algorithm significantly and were competitive for node representation learning. Especially, comparing against used baseline algorithms, PCGNE methods could capture overlapping community structure much better, and thus could achieve better performance for multi-label classification on networks that have more overlapping nodes and/or larger overlapping memberships.

Highlights

Network is a direct and natural way for data organization
EVALUATION we examine the performance of the two Partial Community structure Guided Network Embedding (PCGNE) methods and compare them against six state-ofthe-art network representation learning algorithms, including DeepWalk [10], node2vec [11], LINE [15], GraRep [7], ComE [25] and CNRL [26]
2) RESULTS OF MULTI-LABEL CLASSIFICATION We label each node of synthesized networks according to its community identifier(s); nodes with a same label will have more connections among them, namely node labels are consistent with network topology

Summary

INTRODUCTION

Network (graph) is a direct and natural way for data organization. Information network data is ubiquitous nowadays. We proposed two NRL algorithms that could preserve network community structure well in learned node representations. The two methods extract the information of a 1-step partial community structure for a network firstly, and use the information to guide random walks in DeepWalk or node2vec for node representation learning. (4) We found that the use of these real world networks, including BlogCatalog, Flickr, Protein-Protein Interactions (PPI), and so on for NRL algorithm verification through multi-label classification should be cautioned, since their node labels did not properly encode their network topology, namely node labels are not consistent with their connection relationships.

RELATED WORKS

PARTIAL COMMUNITY STRUCTURE GUIDED NRL

CONNECTION STRENGTH OF A NODE TO A COMMUNITY

EVALUATION

METRICS OF LINK PREDICTION

CONCLUSION