Learning the low-dimensional vector representation of networks can effectively reduce the complexity of various network analysis tasks, such as link prediction, clustering and classification. However, most of the existing network representation learning (NRL) methods are aimed at homogeneous or static networks, while the real-world networks are usually heterogeneous and tend to change dynamically over time, therefore providing an intelligent insight into the evolution of heterogeneous networks is more practical and significant. Based on this consideration, we focus on the dynamic representation learning problem for heterogeneous information networks, and propose a random walk based Dynamic Representation Learning method for Heterogeneous Information Networks (HIN_DRL), which can learn the representation of network nodes at different timestamps. Specifically, we improve the first step of the existing random walk based NRL methods, which generally include two steps: constructing node sequences through random walk process, and then learning node representations by throwing the node sequences into a homogeneous or heterogeneous Skip-Gram model. In order to construct optimized node sequences for evolving heterogeneous networks, we propose a method for automatically extracting and extending meta-paths, and propose a new method for generating node sequences via dynamic random walk based on meta-path and timestamp information of networks. We also propose two strategies for adjusting the quantity and length of node sequences during each random walk process, which makes it more effective to construct the node sequences for heterogeneous information networks at a specific timestamp, thus improving the effect of dynamic representation learning. Extensive experimental results show that compared with the state-of-art algorithms, HIN_DRL achieves better results in Macro-F1, Micro-F1 and NMI for multi-label node classification, multi-class node classification and node clustering on several real-world network datasets. Furthermore, case studies of visualization and dynamic on Microsoft Academic dataset demonstrate that HIN_DRL can learn network representation dynamically and more effectively.
Read full abstract