Approximate information for efficient exploration-exploitation strategies.

Alex Barbier-Chebbah,Christian L Vestergaard,Jean-Baptiste Masson

doi:10.1103/physreve.109.l052105

Abstract

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Approximate information for efficient exploration-exploitation strategies.

Abstract

Talk to us

Similar Papers

More From: Physical review. E

Lead the way for us

Similar Papers

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields
Jiazhen Wu
Highlights in Science, Engineering and Technology | VOL. 94
Jiazhen WuJiazhen Wu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
-
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
01 Jan 2023
01 Jan 2023

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
Journal of Revenue and Pricing Management | VOL. 20
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
24 Mar 2021
Journal of Revenue and Pricing Management | VOL. 20

Thompson Sampling for Dynamic Multi-armed Bandits
Neha Gupta ... Ole-Christoffer Granmo
-
Neha Gupta, et. al.Neha Gupta ... Ole-Christoffer Granmo
01 Dec 2011
01 Dec 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Approximate information for efficient exploration-exploitation strategies.

Abstract

Talk to us

Similar Papers

More From: Physical review. E