Abstract

There are numerous real-world problems where a user must make decisions under uncertainty. For the problem of influence maximization on a social network, for example, the user must select a set of K influencers who will jointly have a large influence on many users. With the lack of prior knowledge about the diffusion process or even topological information, this problem becomes quite challenging. This problem can be cast as a combinatorial bandit problem, where the user can repeatedly choose a candidate set of K out of N arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. In this work, we present the first combinatorial bandit algorithm for which the only feedback is a non-linear reward of the selected K arms. No other feedback is needed. In the context of influence maximization, this means no feedback in the form of which nodes or edges were activated needs to be available, just the amount of influence. The novel algorithm we propose, CMAB-SM, is based on a divide-and-conquer strategy. It is computationally and storage efficient. Over a time horizon T , the proposed algorithm achieves a regret bound of Õ( K 1/2 N 1/3 T 2/3 ). This bound is sub-linear in all of the parameters: T , N , and K . We empirically demonstrate our algorithm’s performance using the applications of influence maximization and product cross-selling. For influence maximization, we provide experiments on real-world social networks, showing that the proposed CMAB algorithm outperforms bandit-specific and social-influence-domain-specific algorithms in terms of empirical run-time and expected influence. For product cross-selling, we also demonstrate that the proposed CMAB algorithm outperforms considered baselines on synthetic data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call