Abstract

Network embedding aims to learn the low-dimensional representations of nodes in networks. It preserves the structure and internal attributes of the networks while representing nodes as low-dimensional dense real-valued vectors. These vectors are used as inputs of machine learning algorithms for network analysis tasks such as node clustering, classification, link prediction, and network visualization. The network embedding algorithms, which considered the community structure, impose a higher level of constraint on the similarity of nodes, and they make the learned node embedding results more discriminative. However, the existing network representation learning algorithms are mostly unsupervised models; the pairwise constraint information, which represents community membership, is not effectively utilized to obtain node embedding results that are more consistent with prior knowledge. This paper proposes a semisupervised modularized nonnegative matrix factorization model, SMNMF, while preserving the community structure for network embedding; the pairwise constraints (must-link and cannot-link) information are effectively fused with the adjacency matrix and node similarity matrix of the network so that the node representations learned by the model are more interpretable. Experimental results on eight real network datasets show that, comparing with the representative network embedding methods, the node representations learned after incorporating the pairwise constraints can obtain higher accuracy in node clustering task and the results of link prediction, and network visualization tasks indicate that the semisupervised model SMNMF is more discriminative than unsupervised ones.

Highlights

  • Most systems in the real world exist in the form of networks, such as protein networks in biological systems, logistics networks in transportation systems, and the most common social networks such as Facebook and WeChat, and the research and analysis of these complex networks’ information have high application value [1,2,3]

  • In order to verify the effect of different amounts of prior information on the representation of nodes, the ratio of prior information is set to 1%, 2%, 5%, and 10%, named as SMNMF (1), SMNMF (2), SMNMF (5), and SMNMF (10), respectively

  • The ratio of prior information in the DNR model is set to 1%, 2%, 5%, and 10%, which is convenient for comparison under the same parameters. e reason why this paper uses different proportion of pairwise constraints is to prove that the semisupervised network representation learning model with pairwise constraints can learn more discriminative node representation vectors than the unsupervised network representation learning model, and the more pairwise constraints are used, the learned node representation vectors perform better in subsequent network analysis tasks

Read more

Summary

Introduction

Most systems in the real world exist in the form of networks, such as protein networks in biological systems, logistics networks in transportation systems, and the most common social networks such as Facebook and WeChat, and the research and analysis of these complex networks’ information have high application value [1,2,3]. Network analysis is highly dependent on the representation of network data. Most traditional representation methods are based on adjacency matrix, but the adjacency matrix is high dimensional and has the problem of sparsity. Is representation has limitations in statistical learning tasks, and when processing large-scale data, it will result in high-computational load and operation time. With the development and application of representation learning technology in the field of natural language processing, more and more scholars began to explore how to represent network nodes with low-dimensional dense vectors [4]. E network embedding methods learn effective lowdimensional node representation vectors while preserving the network structure and inherent attributes. Through vector-based machine learning algorithms, the node representation vectors can be used as features of nodes to perform network analysis tasks, such as community detection, node classification, link prediction, and network visualization. Line [6] describes the first-order and second-order similarity of nodes with two different

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call