Multi-armed Bandit Research Articles

Caching popular content at the edge of wireless networks leads to backhaul congestion mitigation. To come up with an effective caching policy, content popularity distribution should be taken into account, which is not accurately known in most practical scenarios. Moreover, the mobile users’ (MU) request pattern may not always follow a well-defined distribution since some malicious MUs may deliberately issue their requests incompatible with the content popularity statistics. In this paper, we consider the problem of cache content placement in a 5G mmWave small cell network that relies on integrated access and backhaul (IAB) technology for pushing contents to MUs. We assume that the IAB node is equipped with a cache and has no prior knowledge about the content popularity profiles; instead, it only relies on the observation of the instantaneous demands to shape its caching policy. Also, malicious MUs may exist whose goals are to increase cache miss by issuing fictitious requests. The IAB node decides on which contents to cache and for how long, given that frequently replacing contents incurs administrative costs. We model the content placement problem as an ”adversarial combinatorial multi-armed bandit process with switching costs (ACMAB-SC)” and present an online learning algorithm for shaping the caching policy. We conduct extensive simulation experiments to evaluate the convergence property and assess the performance of our algorithm in terms of backhaul congestion, delay, and cache hit ratio. We also compare against two baseline online learning schemes, including a CMAB-based approach and a generic caching policy based on the ”Follow the Perturbed Leader (FPL)” algorithm.

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts X t to actions A t in an attempt to maximize stochastic rewards R t . This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: for example, it is often of interest to estimate the properties of a hypothetical policy that is different from the logging policy that was used to collect the data—a problem known as “off-policy evaluation” (OPE). Using modern martingale techniques, we present a comprehensive framework for OPE inference that relaxes unnecessary conditions made in some past works (such as performing inference at prespecified sample sizes, uniformly bounded importance weights, constant logging policies, and constant policy values, among others), significantly improving on them both theoretically and empirically. Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time series (such as if they are drifting over time). More concretely, we derive confidence sequences for various functionals of interest in OPE. These include doubly robust ones for time-varying off-policy mean reward values, but also confidence bands for the entire cumulative distribution function of the off-policy reward distribution. All of our methods (a) are valid at arbitrary stopping times; (b) only make nonparametric assumptions; (c) do not require importance weights to be uniformly bounded, and if they are, we do not need to know these bounds; and (d) adapt to the empirical variance of our estimators. In summary, our methods enable anytime-valid off-policy inference using adaptively collected contextual bandit data.

Multi-armed Bandit Research Articles

Related Topics

Articles published on Multi-armed Bandit

Performance variance in Multi-Armed Bandits: In-depth analysis of three core algorithms

Advancing decision-making strategies through a comprehensive study of Multi-Armed Bandit algorithms and applications

Comparative analysis and applications of classic multi-armed bandit algorithms and their variants

The evolution and impact of Multi-Armed Bandit algorithms in social media

Analyzing the strengths and weaknesses of diverse algorithms for solving Multi-Armed Bandit problems using Python

Enhancing conversational recommendation systems through the integration of KNN with ConLinUCB contextual bandits

Exploring the depths of Multi-Armed Bandit algorithms: From theoretical foundations to modern applications

Optimizing video click-through rates with bandit algorithms

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Enhancing movie recommendations through comparative analysis of UCB algorithm variants

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

Interactive preference analysis: A reinforcement learning framework

Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks

Cache content placement in the presence of fictitious requests in mmWave 5G IAB networks

Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems

Transmission scheduling of P2P real-time communication based on restless multi-armed bandit

Corruption-Robust Exploration in Episodic Reinforcement Learning

Client selection for federated learning using combinatorial multi-armed bandit under long-term energy constraint

Anytime-valid off-policy Inference for Contextual Bandits

Personalized Image Generation Through Swiping

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-armed Bandit Research Articles

Related Topics

Articles published on Multi-armed Bandit

Performance variance in Multi-Armed Bandits: In-depth analysis of three core algorithms

Advancing decision-making strategies through a comprehensive study of Multi-Armed Bandit algorithms and applications

Comparative analysis and applications of classic multi-armed bandit algorithms and their variants

The evolution and impact of Multi-Armed Bandit algorithms in social media

Analyzing the strengths and weaknesses of diverse algorithms for solving Multi-Armed Bandit problems using Python

Enhancing conversational recommendation systems through the integration of KNN with ConLinUCB contextual bandits

Exploring the depths of Multi-Armed Bandit algorithms: From theoretical foundations to modern applications

Optimizing video click-through rates with bandit algorithms

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Enhancing movie recommendations through comparative analysis of UCB algorithm variants

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

Interactive preference analysis: A reinforcement learning framework

Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks

Cache content placement in the presence of fictitious requests in mmWave 5G IAB networks

Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems

Transmission scheduling of P2P real-time communication based on restless multi-armed bandit

Corruption-Robust Exploration in Episodic Reinforcement Learning

Client selection for federated learning using combinatorial multi-armed bandit under long-term energy constraint

Anytime-valid off-policy Inference for Contextual Bandits

Personalized Image Generation Through Swiping