A Node Embedding-Based Influential Spreaders Identification Approach

Dongming Chen,Xinyu Huang,Panpan Du,Bo Fang,Dongqi Wang

doi:10.3390/math8091554

Abstract

Node embedding is a representation learning technique that maps network nodes into lower-dimensional vector space. Embedding nodes into vector space can benefit network analysis tasks, such as community detection, link prediction, and influential node identification, in both calculation and richer application scope. In this paper, we propose a two-step node embedding-based solution for the social influence maximization problem (IMP). The solution employs a revised network-embedding algorithm to map input nodes into vector space in the first step. In the second step, the solution clusters the vector space nodes into subgroups and chooses the subgroups’ centers to be the influential spreaders. The proposed approach is a simple but effective IMP solution because it takes both the social reinforcement and homophily characteristics of the social network into consideration in node embedding and seed spreaders selection operation separately. The information propagation simulation experiment of single-point contact susceptible-infected-recovered (SIR) and full-contact SIR models on six different types of real network data sets proved that the proposed social influence maximization (SIM) solution exhibits significant propagation capability.

Highlights

Through years of research on how network structure affects information diffusion, researchers believe that social reinforcement and homophily are the two factors that play essential roles in the process of information going viral [1,2,3]
We naturally extended the continuous bag of words (CBOW) algorithm of DeepWalk to a much faster and more accurate algorithm called centrality-weighted CBOW (IW-CBOW)
In order to compare the performance of different methods in identifying key node groups, we first select m nodes as propagation sources according to a method, and simulate the propagation process through single-point contact SIR and full-contact SIR models

Summary

Introduction

Through years of research on how network structure affects information diffusion, researchers believe that social reinforcement and homophily are the two factors that play essential roles in the process of information going viral [1,2,3]. Social reinforcement inside communities tends to trigger multiple exposures, and each additional exposure significantly increases the probabilities of individuals adopting social behaviors [1], which is the underlying assumption of classic information diffusion models like the LTM (linear threshold model). Let us posit that there is an underlying network over which information propagates, so the social reinforcement and homophily factors implying that both local and global structural information of the network should be taken into consideration. The research on critical node set recognition originated from the thinking of Domingos and Richardson in “viral marketing” [5,6]. Domingos and Richardson propose to make use of the customers’ ‘network value’, which means put more promotion effort to profit from customers who may be influenced to buy by current customers or who may influence other customers [5,7]

Objectives

Results

Conclusion