A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Joel Q L Chang,Vincent Y F Tan

doi:10.1609/aaai.v36i6.20564

Abstract

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals ρ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm ρ-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under the CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of ρ-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 2

Similar Papers

Thompson Sampling for Bandit Learning in Matching Markets
Fang Kong ... Shuai Li
-
Fang Kong, et. al.Fang Kong ... Shuai Li
01 Jul 2022
01 Jul 2022

Lenient Regret for Multi-Armed Bandits
Nadav Merlis ... Shie Mannor
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Nadav Merlis, et. al.Nadav Merlis ... Shie Mannor
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Nonparametric General Reinforcement Learning.

arXiv: Artificial Intelligence | VOL. -

28 Nov 2016
arXiv: Artificial Intelligence | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence