Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care.

Ali Shirali,Ahmed Alaa,Alexander Schubert

doi:10.1109/jbhi.2024.3415115

Abstract

Medical treatment decisions inherently involve a series of sequential choices, each informed by the outcomes of preceding decisions. This process closely aligns with the principles of reinforcement learning (RL), which also focuses on sequential decisions aimed at maximizing cumulative rewards. Consequently, RL holds significant promise for developing data-driven treatment plans. However, a major challenge in applying RL within medical contexts lies in the sparse nature of the rewards, which are primarily based on mortality outcomes. This sparsity can reduce the stability of offline estimates, posing a significant hurdle in fully utilizing RL for medical decision-making. In this work, we introduce a deep Q-learning approach able to obtain more reliable critical care policies. This method integrates relevant but noisy intermediate biomarker signals into the reward specification without compromising the optimization of the main outcome of interest (e.g., patient survival). We achieve this by first pruning the action space based on all available rewards, and then training a final model based on the (sparse) main reward, while only choosing actions available within the pruned action space. By disentangling sparse rewards and frequently measured reward proxies through action pruning, potential distortions of the main objective are minimized, all while enabling the extraction of valuable information from intermediate signals that can guide the learning process. We evaluate our method in both off-policy and offline settings using simulated environments and real health records of patients in intensive care units. Our empirical results indicate that our method outperforms common offline RL methods such as conservative Q-learning and batch-constrained deep Q-learning. Our work is a step towards developing reliable policies by effectively harnessing the wealth of available information in data-intensive critical care environments.

Full Text