Fast and error-adaptive influence maximization based on Count-Distinct sketches

Gökhan Göktürk,Kamer Kaya

doi:10.1016/j.ins.2023.119875

Abstract

Influence maximization (IM) is the problem of finding a seed vertex set that maximizes the expected number of vertices influenced under a given diffusion model. Due to the NP-Hardness of finding an optimal seed set, high-quality yet expensive approximation algorithms have been frequently used. In addition, lightweight, sketch-based approaches, which do not step-by-step simulate the influence process, have been proposed in the literature to cope with the scale of today's networks. In this work, we describe a fast, error-adaptive approach that leverages Count-Distinct sketches and hash-based fused sampling to avoid step-by-step simulations and estimate the number of influenced vertices throughout a diffusion. Furthermore, for faster estimations, the sketches of a vertex for consecutive simulations are stored adjacently in memory. This allows the proposed algorithm to estimate the number of influenced vertices for multiple simulations at once. For faster processing, the proposed method automatically rebuilds the sketches after observing estimation errors above a given threshold. Our experimental results show that the proposed algorithm yields seed sets with comparable quality while being up to 3,337× faster than a state-of-the-art, high-quality influence maximization algorithm. In addition, it is up to 63× faster than a sketch-based approach while producing seed sets with 2%–10% better influence scores.

Full Text