Abstract

The multi-armed bandit (MAB) model has been deeply studied to solve many online learning problems, such as rate allocation in communication networks, Ad recommendation in social networks, etc. In an MAB model, given N arms whose rewards are unknown in advance, the player selects exactly one arm in each round, and his goal is to maximize the cumulative rewards over a fixed horizon. In this paper, we study the budget-constrained auction-based combinatorial multi-armed bandit mechanism with strategic arms, where the player can select K (< N) arms in a round and pulling each arm has a unique cost. In addition, each arm might strategically report its cost in the auction. To this end, we combine the upper confidence bound (UCB) with auction to define the UCB-based rewards and then devise an auction-based UCB algorithm (called AUCB). In each round, AUCB selects the top K arms according to the ratios of UCB-based rewards to bids and further determines the critical payment for each arm. For AUCB, we derive an upper bound on regret and prove the truthfulness, individual rationality, and computational efficiency. Extensive simulations show that the rewards achieved by AUCB are at least 12.49% higher than those of state-of-the-art algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call