Multi-armed Bandit Model Research Articles

Digital health programs provide individualized support to patients with chronic diseases and their effectiveness is measured by the extent to which patients achieve target individual clinical outcomes and the program's ability to sustain patient engagement. However, patient dropout and inequitable intervention delivery strategies, which may unintentionally penalize certain patient subgroups, represent challenges to maximizing effectiveness. Therefore, methodologies that optimize the balance between success factors (achievement of target clinical outcomes and sustained engagement) equitably would be desirable, particularly when there are resource constraints. Our objectives were to propose a model for digital health program resource management that accounts jointly for the interaction between individual clinical outcomes and patient engagement, ensures equitable allocation as well as allows for capacity planning, and conducts extensive simulations using publicly available data on type 2 diabetes, a chronic disease. We propose a restless multiarmed bandit (RMAB) model to plan interventions that jointly optimize long-term engagement and individual clinical outcomes (in this case measured as the achievement of target healthy glucose levels). To mitigate the tendency of RMAB to achieve good aggregate performance by exacerbating disparities between groups, we propose new equitable objectives for RMAB and apply bilevel optimization algorithms to solve them. We formulated a model for the joint evolution of patient engagement and individual clinical outcome trajectory to capture the key dynamics of interest in digital chronic disease management programs. In simulation exercises, our optimized intervention policies lead to up to 10% more patients reaching healthy glucose levels after 12 months, with a 10% reduction in dropout compared to standard-of-care baselines. Further, our new equitable policies reduce the mean absolute difference of engagement and health outcomes across 6 demographic groups by up to 85% compared to the state-of-the-art. Planning digital health interventions with individual clinical outcome objectives and long-term engagement dynamics as considerations can be both feasible and effective. We propose using an RMAB sequential decision-making framework, which may offer additional capabilities in capacity planning as well. The integration of an equitable RMAB algorithm further enhances the potential for reaching equitable solutions. This approach provides program designers with the flexibility to switch between different priorities and balance trade-offs across various objectives according to their preferences.

Read full abstract

As the specific incarnation of cyber-physical-social systems, in deregulated electricity market, the market gaming behaviors may have significantly affected the costs of electricity delivered to the market. Especially, from the supply side, the primary goal of power generating companies (PGCs) is to develop strategic biddings to maximize their profits in long-term trading, when facing intrinsic uncertainty. Typically, in such repeated and dynamic settings, one fundamental challenge is that, any PGC neither has prior knowledge about all unknown opponents’ incentives, nor observes their strategies and obtained profits. Especially, the common setting is that, once the bidding auction has occurred, the PGC only observes the market clearing price (MCP) at each round, and winning or losing status. While it is typical to assume some perfect or bounded rationality model of the PGCs, their real behaviors do not follow such assumptions due to lack of complete information, computational intractability, or lack of perfect execution, etc. We formulate the problem of sequentially optimizing any PGC's bids with an adversarial multi-armed bandit (MAB) model. Specifically, at each round, a PGC chooses to play against all other opponents from an infinite set of possible strategies that are split into continuous intervals by sequentially occurred MCPs. Then at the end of each round, the PGC observes the outcome of the auction and updates its estimation on the expected bid's fitness for each interval (i.e., how much the expected profit of the interval could be achieved), and selects the bid for the next round using the proposed algorithm Exp3C (i.e., exponential-weight for exploration and exploitation with continuous value). The experimental results based on real dataset demonstrate that Exp3C performs better than other heuristic schemes including pure greedy, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\boldsymbol{\varepsilon}}$</tex-math></inline-formula> -greedy and MCP predication based bidding schemes. Moreover, we theoretically prove the upper bound of average Exp3C regret per round follows <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">${\boldsymbol{O}} ({2/\sqrt {\boldsymbol{T}} })$</tex-math></inline-formula> , where <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</i> is the number of total rounds. In summary, the proposed Exp3C has two distinguished advantages. First it is distributed, since its decisions uniquely depend on its past decisions and profits. Second, it is rational, since a PGC is given guarantees on its own accumulated profit regardless of other PGCs’ behaviors.

Read full abstract

Multi-armed Bandit Model Research Articles

Related Topics

Articles published on Multi-armed Bandit Model

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Combinatorial-restless-bandit-based transmitter–receiver online selection of distributed MIMO radar with non-stationary channels

A survey of the application and technical improvement of the multi-armed bandit

Multi-Armed Bandit-Based User Network Node Selection.

Optimizing video click-through rates with bandit algorithms

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Multiobjective Lipschitz Bandits under Lexicographic Ordering

New Approach to Equitable Intervention Planning to Improve Engagement and Outcomes in a Digital Health Program: Simulation Study.

Investigation of frontier Multi-Armed Bandit algorithms and applications

EMS: Erasure-Coded Multi-Source Streaming for UHD Videos Within Cloud Native 5G Networks

Tracking people across ultra populated indoor spaces by matching unreliable Wi-Fi signals with disconnected video feeds

Truthful User Recruitment for Cooperative Crowdsensing Task: A Combinatorial Multi-Armed Bandit Approach

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits.

A novel combinatorial multi-armed bandit game to identify online the changing top-[formula omitted] flows in software-defined networks

Multi-armed bandit problem with online clustering as side information

Automated Quantum Circuit Design With Nested Monte Carlo Tree Search

Earning While Learning: An Adversarial Multi-Armed Bandit Based Real-Time Bidding Scheme in Deregulated Electricity Market

Minimax Off-Policy Evaluation for Multi-Armed Bandits

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-armed Bandit Model Research Articles

Related Topics

Articles published on Multi-armed Bandit Model

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Combinatorial-restless-bandit-based transmitter–receiver online selection of distributed MIMO radar with non-stationary channels

A survey of the application and technical improvement of the multi-armed bandit

Multi-Armed Bandit-Based User Network Node Selection.

Optimizing video click-through rates with bandit algorithms

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Multiobjective Lipschitz Bandits under Lexicographic Ordering

New Approach to Equitable Intervention Planning to Improve Engagement and Outcomes in a Digital Health Program: Simulation Study.

Investigation of frontier Multi-Armed Bandit algorithms and applications

EMS: Erasure-Coded Multi-Source Streaming for UHD Videos Within Cloud Native 5G Networks

Tracking people across ultra populated indoor spaces by matching unreliable Wi-Fi signals with disconnected video feeds

Truthful User Recruitment for Cooperative Crowdsensing Task: A Combinatorial Multi-Armed Bandit Approach

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits.

A novel combinatorial multi-armed bandit game to identify online the changing top-[formula omitted] flows in software-defined networks

Multi-armed bandit problem with online clustering as side information

Automated Quantum Circuit Design With Nested Monte Carlo Tree Search

Earning While Learning: An Adversarial Multi-Armed Bandit Based Real-Time Bidding Scheme in Deregulated Electricity Market

Minimax Off-Policy Evaluation for Multi-Armed Bandits