Off-policy Evaluation Research Articles

One of the significant changes in intensive care medicine over the past 2 decades is the acknowledgment that improper mechanical ventilation settings substantially contribute to pulmonary injury in critically ill patients. Artificial intelligence (AI) solutions can optimize mechanical ventilation settings in intensive care units (ICUs) and improve patient outcomes. Specifically, machine learning algorithms can be trained on large datasets of patient information and mechanical ventilation settings. These algorithms can then predict patient responses to different ventilation strategies and suggest personalized ventilation settings for individual patients. In this study, we aimed to design and evaluate an AI solution that could tailor an optimal ventilator strategy for each critically ill patient who requires mechanical ventilation. We proposed a reinforcement learning-based AI solution using observational data from multiple ICUs in the United States. The primary outcome was hospital mortality. Secondary outcomes were the proportion of optimal oxygen saturation and the proportion of optimal mean arterial blood pressure. We trained our AI agent to recommend low, medium, and high levels of 3 ventilator settings-positive end-expiratory pressure, fraction of inspired oxygen, and ideal body weight-adjusted tidal volume-according to patients' health conditions. We defined a policy as rules guiding ventilator setting changes given specific clinical scenarios. Off-policy evaluation metrics were applied to evaluate the AI policy. We studied 21,595 and 5105 patients' ICU stays from the e-Intensive Care Unit Collaborative Research (eICU) and Medical Information Mart for Intensive Care IV (MIMIC-IV) databases, respectively. Using the learned AI policy, we estimated the hospital mortality rate (eICU 12.1%, SD 3.1%; MIMIC-IV 29.1%, SD 0.9%), the proportion of optimal oxygen saturation (eICU 58.7%, SD 4.7%; MIMIC-IV 49%, SD 1%), and the proportion of optimal mean arterial blood pressure (eICU 31.1%, SD 4.5%; MIMIC-IV 41.2%, SD 1%). Based on multiple quantitative and qualitative evaluation metrics, our proposed AI solution outperformed observed clinical practice. Our study found that customizing ventilation settings for individual patients led to lower estimated hospital mortality rates compared to actual rates. This highlights the potential effectiveness of using reinforcement learning methodology to develop AI models that analyze complex clinical data for optimizing treatment parameters. Additionally, our findings suggest the integration of this model into a clinical decision support system for refining ventilation settings, supporting the need for prospective validation trials.

BackgroundReinforcement learning (RL) holds great promise for intensive care medicine given the abundant availability of data and frequent sequential decision-making. But despite the emergence of promising algorithms, RL driven bedside clinical decision support is still far from reality. Major challenges include trust and safety. To help address these issues, we introduce cross off-policy evaluation and policy restriction and show how detailed policy analysis may increase clinical interpretability. As an example, we apply these in the setting of RL to optimise ventilator settings in intubated covid-19 patients.MethodsWith data from the Dutch ICU Data Warehouse and using an exhaustive hyperparameter grid search, we identified an optimal set of Dueling Double-Deep Q Network RL models. The state space comprised ventilator, medication, and clinical data. The action space focused on positive end-expiratory pressure (peep) and fraction of inspired oxygen (FiO2) concentration. We used gas exchange indices as interim rewards, and mortality and state duration as final rewards. We designed a novel evaluation method called cross off-policy evaluation (OPE) to assess the efficacy of models under varying weightings between the interim and terminal reward components. In addition, we implemented policy restriction to prevent potentially hazardous model actions. We introduce delta-Q to compare physician versus policy action quality and in-depth policy inspection using visualisations.ResultsWe created trajectories for 1118 intensive care unit (ICU) admissions and trained 69,120 models using 8 model architectures with 128 hyperparameter combinations. For each model, policy restrictions were applied. In the first evaluation step, 17,182/138,240 policies had good performance, but cross-OPE revealed suboptimal performance for 44% of those by varying the reward function used for evaluation. Clinical policy inspection facilitated assessment of action decisions for individual patients, including identification of action space regions that may benefit most from optimisation.ConclusionCross-OPE can serve as a robust evaluation framework for safe RL model implementation by identifying policies with good generalisability. Policy restriction helps prevent potentially unsafe model recommendations. Finally, the novel delta-Q metric can be used to operationalise RL models in clinical practice. Our findings offer a promising pathway towards application of RL in intensive care medicine and beyond.

Off-policy Evaluation Research Articles

Articles published on Off-policy Evaluation

Reinforcement Learning to Optimize Ventilator Settings for Patients on Invasive Mechanical Ventilation: Retrospective Study.

Off-Policy Evaluation in Doubly Inhomogeneous Environments

Anytime-valid off-policy Inference for Contextual Bandits

Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Distributional Off-Policy Evaluation for Slate Recommendations

A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets

Off-policy evaluation for tabular reinforcement learning with synthetic trajectories

Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes.

Efficient evaluation of natural stochastic policies in off-line reinforcement learning

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

Off-policy evaluation in partially observed Markov decision processes under sequential ignorability

Optimal discharge of patients from intensive care via a data-driven policy learning framework

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Catoni-style confidence sequences for heavy-tailed mean estimation

Off-Policy Evaluation With Online Adaptation for Robot Exploration in Challenging Environments

Temporal-difference emphasis learning with regularized correction for off-policy evaluation and control

Towards more efficient and robust evaluation of sepsis treatment with deep reinforcement learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Off-policy Evaluation Research Articles

Articles published on Off-policy Evaluation

Reinforcement Learning to Optimize Ventilator Settings for Patients on Invasive Mechanical Ventilation: Retrospective Study.

Off-Policy Evaluation in Doubly Inhomogeneous Environments

Anytime-valid off-policy Inference for Contextual Bandits

Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Distributional Off-Policy Evaluation for Slate Recommendations

A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets

Off-policy evaluation for tabular reinforcement learning with synthetic trajectories

Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes.

Efficient evaluation of natural stochastic policies in off-line reinforcement learning

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

Off-policy evaluation in partially observed Markov decision processes under sequential ignorability

Optimal discharge of patients from intensive care via a data-driven policy learning framework

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Catoni-style confidence sequences for heavy-tailed mean estimation

Off-Policy Evaluation With Online Adaptation for Robot Exploration in Challenging Environments

Temporal-difference emphasis learning with regularized correction for off-policy evaluation and control

Towards more efficient and robust evaluation of sepsis treatment with deep reinforcement learning