The multi-armed bandit (MAB) models have always received lots of attention from multiple research communities due to their broad application domains. The optimal selection problem with unknown rewards in advance, such as ad recommendation in social networks, spectrum access in the cognitive radio field, etc., can be efficiently solved by using MAB models. In an MAB model, given <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> arms whose rewards are unknown in advance, the player selects exactly one arm in each round, and his goal is to maximize the cumulative rewards over a fixed horizon. Further, a more general model called combinatorial MAB (i.e., CMAB), where <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> arms can be played simultaneously in each round, is put forward. However, the existing CMAB models neglect the strategic behaviors of the <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> arms, which indicates that one arm might report false information to increase its own profits. In fact, in many applications such as user selection in crowdsensing, the arms are not the feelingless machines but the rational individuals. To this end, we combine the upper confidence bound (UCB) with auction theory to develop a new algorithm called auction-based UCB (AUCB). We divide the auction-based CMAB problem into two sub-problems: winning arm selection and payment computation problems. For AUCB, we derive an upper bound on regret and prove the truthfulness in one round, individual rationality, and computational efficiency. In addition, we consider an extended situation that some arms may be unavailable in some rounds and the arms will bid inconsistently in different rounds. We devise another algorithm called eAUCB to solve this problem. Extensive simulations are conducted to show the significant performance of the proposed algorithms.
Read full abstract