In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Jiazhen Wu

doi:10.54097/d3ez0n61

Abstract

This paper presents an in-depth analysis of the Multi-Armed Bandit (MAB) problem, tracing its evolution from its origins in the gambling domain of the 1940s to its current prominence in machine learning and artificial intelligence. The analysis begins with a historical overview, noting key developments like Herbert Robbins' probabilistic framework and the expansion of the problem into strategic decision-making in the 1970s. The emergence of algorithms like the Upper Confidence Bound (UCB) and Thompson Sampling in the late 20th century is highlighted, demonstrating the MAB problem's transition to practical applications. The integration of MAB algorithms with machine learning, particularly in the era of reinforcement learning, is explored, emphasizing their application in various domains such as online advertising, financial market trading, and clinical trials. The paper discusses the critical role of decision theory and probabilistic models in MAB problems, focusing on the balance between exploration and exploitation strategies. Recent advancements in Contextual Bandits, non-stationary reward distributions, and Multi-agent Bandits are examined, showcasing the ongoing evolution and adaptability of MAB problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Abstract

Talk to us

Similar Papers

More From: Highlights in Science, Engineering and Technology

Lead the way for us

Journal: Highlights in Science, Engineering and Technology	Publication Date: Apr 26, 2024
License type: CC BY-NC 4.0

Similar Papers

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
-
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
01 Jan 2023
01 Jan 2023

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
Journal of Revenue and Pricing Management | VOL. 20
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
24 Mar 2021
Journal of Revenue and Pricing Management | VOL. 20

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Dynamic multi-arm bandit game based multi-agents spectrum sharing strategy design
Jingyang Lu ... Khanh Pham
-
Jingyang Lu, et. al.Jingyang Lu ... Khanh Pham
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Abstract

Talk to us

Similar Papers

More From: Highlights in Science, Engineering and Technology