On-policy Reinforcement Learning Research Articles

Conventional intensity modulated radiation therapy (IMRT) with a typical 5-20 fixed beams often does not provide sufficient angular sampling required for conformal dose shaping, whereas current volumetric modulated arc therapy (VMAT) discretizes the angular space into equally spaced control points without considering the differential need for intensity modulation of different angles, leading to undersampling at some angles while oversampling at some other angles. Our goal is to develop a node or station parameter optimized radiation therapy (SPORT) strategy with simultaneously optimized angular sampling and beam modulation by leveraging state-of-the-art reinforcement learning and the unique capability of modern digital LINACs in dose delivery through programmable nodal points. We developed a SPORT optimization framework, in which, the process of programming control points (or station parameters) was formulated as a stochastic dynamic programming problem, which was solved by a reinforcement learning-based algorithm. On-policy reinforcement learning method, namely, state-action-reward-state-action (SARSA) was integrated with deep convolutional neural network to predict station parameters by utilizing the patient's anatomical structures meanwhile considering the delivery capability of a typical digital LINAC machine. Here, the deep convolutional neural network estimated the state-action value by using the quality of the plan with current station parameters when a next potential station parameter was selected. The state-action value was then updated by SARSA learning. The quality of the plan was quantified by dosimetry constraints. The model was assessed by a retrospective study on a cohort of patients underwent head-and-neck radiation therapy. Dosimetric analysis and delivery efficiency comparisons were used to evaluate the performance of the proposed framework. Our model was used to generate 16 plans unseen in the original training set. All the plans predicted by our model achieved better dose distributions without violating clinical planning constraints. Moreover, instead of using 4 full standard arcs in the original clinically used plans obtained via manual optimization, the predicted plans only used one full standard arc (about 178 control points) plus boost from a few sub-arcs (less than 30 degrees of gantry angles), which significantly improved the efficiency of the beam delivery. We are in the process of integrating the sub-arcs into the full arc by considering the programmable capability of modern LINACs. We demonstrated that a machine learning-based SPORT framework capable of optimizing the spatial sampling and beam modulation simultaneously for modern radiation therapy. The framework not only significantly improves the quality and efficiency of beam delivery, but also has the potential to be incorporated into current clinical workflow to improve the efficiency of dose planning and delivery.

Effective treatment of Parkinson’s disease (PD) is a continual challenge for healthcare providers, and providers can benefit from leveraging emerging technologies to supplement traditional clinic care. We develop a data-driven reinforcement learning (RL) framework to optimize PD medication regimens through wearable sensors. We leverage a data set of n = 26 PD patients who wore wrist-mounted movement trackers for two separate six-day periods. Using these data, we first build and validate a simulation model of how individual patients’ movement symptoms respond to medication administration. We then pair this simulation model with an on-policy RL algorithm that recommends optimal medication types, timing, and dosages during the day while incorporating human-in-the-loop considerations on medication administration. The results show that the RL-prescribed medication regimens outperform physicians’ medication regimens, despite physicians having access to the same data as the RL agent. To validate our results, we assess our wearable-based RL medication regimens using n = 399 PD patients from the Parkinson’s Progression Markers Initiative data set. We show that the wearable-based RL medication regimens would lead to significant symptom improvement for these patients, even more so than training RL policies directly from this data set. In doing so, we show that RL models from even small data sets of wearable data can offer novel, generalizable clinical insights and medication strategies, which may outperform those derived from larger data sets without wearable data. This paper was accepted by Carri Chan, healthcare management. Funding: This research is partially supported by the Science Alliance, University of Tennessee and by the Laboratory Directed Research and Development Program, Oak Ridge National Laboratory managed by UT-Battelle, LLC for the U.S. Department of Energy. Data used in this article were obtained from the Parkinson Progression Markers Initiative (PPMI) database, which is sponsored by the Michael J. Fox Foundation for Parkinson’s Research (MJFF). Supplemental Material: The data files and online appendix are available at https://doi.org/10.1287/mnsc.2023.4747 .

On-policy Reinforcement Learning Research Articles

Related Topics

Articles published on On-policy Reinforcement Learning

Integrating human learning and reinforcement learning: A novel approach to agent training

Reinforcement Learning Powered Station Parameter Optimized Radiation Therapy (SPORT): A Novel Treatment Planning and Beam Delivery Technique

A Multi-Scaling Reinforcement Learning Trading System Based on Multi-Scaling Convolutional Neural Networks

An Actor-Critic Algorithm for the Stochastic Cutting Stock Problem

Optimizing Patient-Specific Medication Regimen Policies Using Wearable Sensors in Parkinson’s Disease

Delay and energy aware task scheduling mechanism for fog-enabled IoT applications: A reinforcement learning approach

Off-policy and on-policy reinforcement learning with the Tsetlin machine

Content-Adaptive Auto-Occlusion Network for Occluded Person Re-Identification.

Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Few-Shot Model-Based Adaptation in Noisy Conditions

A Relearning Approach to Reinforcement Learning for control of Smart Buildings

Actor-critic learning for optimal building energy management with phase change materials

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

Stock Market Trading Agent Using On-Policy Reinforcement Learning Algorithms

Event-triggered resilient control for cyber-physical system under denial-of-service attacks

An adaptive obstacle avoidance algorithm for unmanned surface vehicle in complicated marine environments

On-policy concurrent reinforcement learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

On-policy Reinforcement Learning Research Articles

Related Topics

Articles published on On-policy Reinforcement Learning

Integrating human learning and reinforcement learning: A novel approach to agent training

Reinforcement Learning Powered Station Parameter Optimized Radiation Therapy (SPORT): A Novel Treatment Planning and Beam Delivery Technique

A Multi-Scaling Reinforcement Learning Trading System Based on Multi-Scaling Convolutional Neural Networks

An Actor-Critic Algorithm for the Stochastic Cutting Stock Problem

Optimizing Patient-Specific Medication Regimen Policies Using Wearable Sensors in Parkinson’s Disease

Delay and energy aware task scheduling mechanism for fog-enabled IoT applications: A reinforcement learning approach

Off-policy and on-policy reinforcement learning with the Tsetlin machine

Content-Adaptive Auto-Occlusion Network for Occluded Person Re-Identification.

Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Few-Shot Model-Based Adaptation in Noisy Conditions

A Relearning Approach to Reinforcement Learning for control of Smart Buildings

Actor-critic learning for optimal building energy management with phase change materials

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

Stock Market Trading Agent Using On-Policy Reinforcement Learning Algorithms

Event-triggered resilient control for cyber-physical system under denial-of-service attacks

An adaptive obstacle avoidance algorithm for unmanned surface vehicle in complicated marine environments

On-policy concurrent reinforcement learning