Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.

Emanuele Cavenaghi,Gabriele Sottocornola,Markus Zanker,Fabio Stella

doi:10.3390/e23030380

Emanuele Cavenaghi, Gabriele Sottocornola + Show 2 more

Open Access

https://doi.org/10.3390/e23030380

Copy DOI

Abstract

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function . In particular, we proposed a pessimistic (), an optimistic (), as well as an averaged () version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version () results as the most effective in all real-world tasks.

Highlights

In the context of sequential decision making, the Multi-Armed Bandit (MAB) problem has been extensively studied by researchers in the field of reinforcement learning, since it was firstly introduced in the middle of last century by Robbins [1]
Afterwards, we describe the problem of concept drift in active learning environments
The performance of MAB algorithms are measured with the cumulative reward relative to the oracle, defined as relative cumulative reward (RCR)( M ) = CR(O)

Summary

Introduction

The MAB problem [2], is used to represent the exploration-exploitation dilemma in sequential decision problems, that is, how to acquire knowledge about the set of available actions while exploiting the most profitable ones. During each round of the sequential decision problem, an agent selects one among the K available actions (called arms), and receives a reward (or payoff) proportional to the goodness of its choice. Multi-Armed Bandit (MAB) is a powerful framework that allows agents to solve sequential decision making problems under uncertainty [16]. The algorithm typically needs to explore: try out different arms to acquire new information in order to find the optimal arm in the given environment. A trade-off between exploration, that is, to explore different arms, and exploitation, that is, to exploit the acquired knowledge, is required in order to make optimal near-term decisions based on the available information

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Mar 23, 2021
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

ADES: A New Ensemble Diversity-Based Approach for Handling Concept Drift
Tinofirei Museba ... Yugen Yi
Mobile Information Systems | VOL. 2021
Tinofirei Museba, et. al.Tinofirei Museba ... Yugen Yi
01 Jun 2021
Mobile Information Systems | VOL. 2021

Ensemble Method to classify multi class with concept drift
K Vasantha Kokilam ... D Ponmary Pushpa Latha
Journal of Physics: Conference Series | VOL. 1706
K Vasantha Kokilam, et. al.K Vasantha Kokilam ... D Ponmary Pushpa Latha
01 Dec 2020
Journal of Physics: Conference Series | VOL. 1706

Multi-type concept drift detection under a dual-layer variable sliding window in frequent pattern mining with cloud computing
Jing Chen ... Peng Li
Journal of Cloud Computing Advances Systems and Applications | VOL. 13
Jing Chen, et. al.Jing Chen ... Peng Li
12 Feb 2024
Journal of Cloud Computing Advances Systems and Applications | VOL. 13

A Diversity Framework for Dealing With Multiple Types of Concept Drift Based on Clustering in the Model Space.
Chun Wai Chiu ... Leandro L Minku
IEEE transactions on neural networks | VOL. 33
Chun Wai Chiu, et. al.Chun Wai Chiu ... Leandro L Minku
01 Mar 2022
IEEE transactions on neural networks | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy