Abstract

Medical treatment decisions inherently involve a series of sequential choices, each informed by the outcomes of preceding decisions. This process closely aligns with the principles of reinforcement learning (RL), which also focuses on sequential decisions aimed at maximizing cumulative rewards. Consequently, RL holds significant promise for developing data-driven treatment plans. However, a major challenge in applying RL within medical contexts lies in the sparse nature of the rewards, which are primarily based on mortality outcomes. This sparsity can reduce the stability of offline estimates, posing a significant hurdle in fully utilizing RL for medical decision-making. In this work, we introduce a deep Q-learning approach able to obtain more reliable critical care policies. This method integrates relevant but noisy intermediate biomarker signals into the reward specification without compromising the optimization of the main outcome of interest (e.g., patient survival). We achieve this by first pruning the action space based on all available rewards, and then training a final model based on the (sparse) main reward, while only choosing actions available within the pruned action space. By disentangling sparse rewards and frequently measured reward proxies through action pruning, potential distortions of the main objective are minimized, all while enabling the extraction of valuable information from intermediate signals that can guide the learning process. We evaluate our method in both off-policy and offline settings using simulated environments and real health records of patients in intensive care units. Our empirical results indicate that our method outperforms common offline RL methods such as conservative Q-learning and batch-constrained deep Q-learning. Our work is a step towards developing reliable policies by effectively harnessing the wealth of available information in data-intensive critical care environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.