Abstract

We propose the first budget-limited multi-armed bandit (BMAB) algorithm subject to a union of matroid constraints in arm pulling, while at the same time achieving differential privacy. Our model generalizes the arm-pulling models studied in prior BMAB schemes, and it can be used to address many practical problems such as network backbone construction and dynamic pricing in crowdsourcing. We handle the exploitation versus exploration tradeoff in our BMAB problem by exploiting the combinatorial structures of matroids, and reduce the searching complexity of arm selection based on a divide-and-conquer approach. Our algorithm achieves a uniform logarithmic regret bound with respect to B and ɛ-differential privacy, where B is the budget for pulling the arms with random costs. Without differential privacy, our algorithm achieves a uniform logarithmic regret bound with respect to B, which advances the asymptotic regret bounds achieved by prior BMAB algorithms. We performed side-by-side comparisons with prior schemes in our experiments. Experimental results show that our purely-combinatorial algorithm not only achieves significantly better regret performance, but also is more than 20 times faster than prior BMAB schemes, which use time-consuming LP-solving techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call