Abstract

Current algorithms for solving multi-armed bandit (MAB) problem in stationary observations often perform well. Although this performance may be acceptable with accurate parameter settings, most of them degrade under non stationary observations. We setup an incremental e-greedy model with stochastic mean equation as its action-value function which is more applicable to real-world problems. Unlike the iterative algorithms suffering from step size dependency, we propose an adaptive step-size model (ASM) to introduce adaptive MAB algorithm. The proposed model employs e-greedy approach as action selection policy. In addition, a dynamic exploration parameter e is introduced to be ineffective by increasing decision maker's intelligence. The proposed model is empirically evaluated and compared with existing algorithms including the standard e-greedy, Softmax, e-decreasing and UCB-Tuned models under stationary as well as non stationary situations. ASM not only addresses concerns in parameter dependency problem but also performs either comparable or better than mentioned algorithms. Applying these enhancements to the standard e-greedy reduce the learning time which is more attractive to the wide range of on-line sequential selection-based applications such as autonomous agents, adaptive control, industrial robots and forecasting trend problems in management and economics domains.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.