Balancing therapeutic effect and safety in ventilator parameter recommendation: An offline reinforcement learning approach

Bo Zhang,Xihe Qiu,Xiaoyu Tan

doi:10.1016/j.engappai.2023.107784

Abstract

Reinforcement learning (RL) is increasingly applied in recommending ventilator parameters, yet existing methods prioritize therapeutic effect over patient safety. This leads to excessive exploration by the RL agent, posing risks. To address this, we propose a novel offline RL approach that leverages existing clinical data for exploration, employing fitted Q evaluation (FQE) for policy evaluation to minimize patient risk compared to online evaluation. Our method introduces a variational auto-encoder-gumble softmax (VAE-GS) model, discerning the hidden relationship between patient physiological status and ventilator parameters to constrain the exploratory space of the agent. Additionally, a noise network aids the agent in fully exploring the reachable space to find optimal ventilator parameters. Our approach significantly enhances safety, as evidenced by experiments on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. It outperforms existing algorithms including the deterministic policy gradient algorithm (DDPG), soft actor-critic (SAC), batch-constrained deep Q-learning (BCQ), conservative Q-learning (CQL), and closed-form policy improvement operators (CFPI), showing improvements of 76.9%, 82.8%, 23.5%, 49.1% and 23.5%, respectively, while maintaining therapeutic effect.

Full Text