Abstract

This paper presents the deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real-life complexity in heterogeneous disease progression and treatment choices, with the goal of providing doctors and patients the data-driven personalized decision recommendations. The proposed DRL framework comprises (i) a supervised learning step to predict expert actions, and (ii) a deep reinforcement learning step to estimate the long-term value function of Dynamic Treatment Regimes. Both steps depend on deep neural networks. As a key motivational example, we have implemented the proposed framework on a data set from the Center for International Bone Marrow Transplant Research (CIBMTR) registry database, focusing on the sequence of prevention and treatments for acute and chronic graft versus host disease after transplantation. In the experimental results, we have demonstrated promising accuracy in predicting human experts’ decisions, as well as the high expected reward function in the DRL-based dynamic treatment regimes.

Highlights

  • When the straightforward rule-based treatment guidelines are difficult to be established, the statistical learning method provides a data-driven tool to explore and examine the best strategies

  • We present the performance of the Deep Reinforcement Learning (DRL) framework for optimizing the sequence of treatments in prevention and treatment of both acute and chronic Graft Versus Host Disease (GVHD)

  • We present the accuracy of the feed-forward deep neural networks (DNNs) in predicting the initial conditioning after the transplantation

Read more

Summary

Introduction

When the straightforward rule-based treatment guidelines are difficult to be established, the statistical learning method provides a data-driven tool to explore and examine the best strategies. To make reinforcement learning accessible for more general DTR problems using observational datasets, we need a new framework which (i) automatically extracts and organizes the discriminative information from the data, and (ii) can explore high-dimensional action and state spaces and make personalized treatment recommendations. We incorporate the state-of-the-art DRL/DQN into the DTR methodology and propose a data-driven framework that is scalable and adaptable to optimizing DTR with high-dimensional treatment options, and heterogeneous decision stages. Preprint[21] proposed to use dueling double-DQN with a prioritized experience replay to learn Sepsis Treatment, where the motivating data is from the EHR database This working paper[21] was brought to our attention by one reviewer. Our work is the first and unique general framework proposed for registry databases with similar structure of the motivating Bone Marrow Transplant registry database, which collect long-term follow-up of each patient national wide, throughout the disease course with standard forms for disease-related patient status and treatments

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call