Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Zongqi Wan,Jialin Zhang,Zhijie Zhang,Xiaoming Sun,Tongyang Li

doi:10.1609/aaai.v37i8.26202

Abstract

Multi-arm bandit (MAB) and stochastic linear bandit (SLB) are important models in reinforcement learning, and it is well-known that classical algorithms for bandits with time horizon T suffer from the regret of at least the square root of T. In this paper, we study MAB and SLB with quantum reward oracles and propose quantum algorithms for both models with the order of the polylog T regrets, exponentially improving the dependence in terms of T. To the best of our knowledge, this is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning. Compared to previous literature on quantum exploration algorithms for MAB and reinforcement learning, our quantum input model is simpler and only assumes quantum oracles for each individual arm.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 1

Similar Papers

Author response: Associability-modulated loss learning is increased in posttraumatic stress disorder
Vanessa M Brown ... B Christopher Frueh
-
Vanessa M Brown, et. al.Vanessa M Brown ... B Christopher Frueh
19 Oct 2017
19 Oct 2017

SeaRank: relevance prediction based on click models in a reinforcement learning framework
Amir Hosein Keyhanipour ... Farhad Oroumchian
Program | VOL. 57
Amir Hosein Keyhanipour, et. al.Amir Hosein Keyhanipour ... Farhad Oroumchian
08 Sep 2022
Program | VOL. 57

Deep Reinforcement Learning for Automatic Drilling Optimization Using an Integrated Reward Function
John Bomidi ... Xu Huang
-
John Bomidi, et. al.John Bomidi ... Xu Huang
27 Feb 2024
27 Feb 2024

A reinforcement learning framework for improving parking decisions in last-mile delivery
Juan E Muriel ... Juan G Villegas
Transportmetrica B: Transport Dynamics | VOL. 12
Juan E Muriel, et. al.Juan E Muriel ... Juan G Villegas
08 Apr 2024
Transportmetrica B: Transport Dynamics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence