Abstract

Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks.

Highlights

  • Real-world systems can be represented as networks, with nodes representing the components and links representing the connections between them [1, 2]

  • 3 Results For the link prediction task in a static network, we remove a certain fraction of links from the given network and predict these missing links based on the remaining links

  • We use the area under the curve (AUC) score to evaluate the performance of the algorithms on the link prediction task

Read more

Summary

Introduction

Real-world systems can be represented as networks, with nodes representing the components and links representing the connections between them [1, 2]. We deploy the susceptibleinfected (SI) spreading process on the given network, either static or temporal, and use the corresponding spreading trajectories to generate the node pair set, which is fed to the SkipGram to derive the embedding vectors. We evaluate the efficiency of SI-spreading-based network embedding via exploring the sampling size for the Skip-Gram, quantified as the sum of the length of trajectory paths, in relation to its performance on the link prediction task. 2.1, we propose our SI-spreading-based sampling method for static networks and generation of the node pair set from the trajectory paths. 3, our embedding algorithms are evaluated on a missing link prediction task on real-world static and temporal social networks.

13: Add the trajectory Dgi to D
Results
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.