An algorithm for multi-armed bandit based on variance change sensitivity

Canxin Zhu,Wenjie Zhang,Yifeng Zheng,Jingmin Yang

doi:10.1088/2631-8695/ad4255

Abstract

The Multi-Arm Bandit problem is becoming increasingly popular as it enables real-world sequential decision making across application domains, including clinical trials, recommender systems, and online decision making. The Multi-Arm Bandit problem is a classical problem of exploration and exploitation dilemma in reinforcement learning and it needs to decide on the optimal strategy based on the reward situation of each rocker arm. However, the existing Multi-Arm Bandit algorithms have many shortcomings, such as blind exploration, weak generalization ability and endless exploration. Aiming at the shortcomings of the existing Multi-Armed Bandit algorithms, this paper proposes a Multi-Armed Bandit algorithm based on variance change sensitivity. The algorithm takes the variance change of reward as a clue, and adjusts the exploration probability by the average variance change of all actions, and selects the action with the largest variance change when exploring. At the same time, in order to reduce the waste of action selection times and maximize the cumulative reward, A parameter Ncon was introduced to record the number of consecutive selections of the same action. The exploration stops when Ncon reaches a certain value. Experiments show that the exploration algorithm can obtain higher reward value and lower regret value in the end.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An algorithm for multi-armed bandit based on variance change sensitivity

Abstract

Talk to us

Similar Papers

More From: Engineering Research Express

Lead the way for us

Journal: Engineering Research Express	Publication Date: May 6, 2024
License type: iop-standard

Similar Papers

Investigation of selection and application of Multi-Armed Bandit algorithms in recommendation system
Panyangjie Chen
Applied and Computational Engineering | VOL. 34
Panyangjie ChenPanyangjie Chen
04 Feb 2024
Applied and Computational Engineering | VOL. 34

Optimizing Movie Recommendation Systems with Multi-Armed Bandit Algorithms
Yaosheng Jian
Highlights in Science, Engineering and Technology | VOL. 94
Yaosheng JianYaosheng Jian
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

A Analytical and Practical Insights into Multi-Armed Bandit Problems in Recommendation Systems
Maike Feng
Highlights in Science, Engineering and Technology | VOL. 94
Maike FengMaike Feng
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Dynamic Ambulance Redeployment via Multi-armed Bandits
Ümitcan Şahi̇n ... Veysel Yücesoy
-
Ümitcan Şahi̇n, et. al.Ümitcan Şahi̇n ... Veysel Yücesoy
01 Apr 2019
01 Apr 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An algorithm for multi-armed bandit based on variance change sensitivity

Abstract

Talk to us

Similar Papers

More From: Engineering Research Express