Abstract

Reinforcement learning (RL) is increasingly applied in recommending ventilator parameters, yet existing methods prioritize therapeutic effect over patient safety. This leads to excessive exploration by the RL agent, posing risks. To address this, we propose a novel offline RL approach that leverages existing clinical data for exploration, employing fitted Q evaluation (FQE) for policy evaluation to minimize patient risk compared to online evaluation. Our method introduces a variational auto-encoder-gumble softmax (VAE-GS) model, discerning the hidden relationship between patient physiological status and ventilator parameters to constrain the exploratory space of the agent. Additionally, a noise network aids the agent in fully exploring the reachable space to find optimal ventilator parameters. Our approach significantly enhances safety, as evidenced by experiments on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. It outperforms existing algorithms including the deterministic policy gradient algorithm (DDPG), soft actor-critic (SAC), batch-constrained deep Q-learning (BCQ), conservative Q-learning (CQL), and closed-form policy improvement operators (CFPI), showing improvements of 76.9%, 82.8%, 23.5%, 49.1% and 23.5%, respectively, while maintaining therapeutic effect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call