Abstract

In this paper, we test and evaluate policy gradient reinforcement learning for automated blood glucose control in patients with Type 1 Diabetes Mellitus. Recent research has shown that reinforcement learning is a promising approach to accommodate the need for individualized blood glucose level control algorithms. The motivation for using policy gradient algorithms comes from the fact that adaptively administering insulin is an inherently continuous task. Policy gradient algorithms are known to be superior in continuous high-dimensional control tasks. Previously, most of the approaches for automated blood glucose control using reinforcement learning has used a finite set of actions. We use the Trust-Region Policy Optimization algorithm in this work. It represents the state of the art for deep policy gradient algorithms. The experiments are carried out in-silico using the Hovorka model, and stochastic behavior is modeled through simulated carbohydrate counting errors to illustrate the full potential of the framework. Furthermore, we use a model-free approach where no prior information about the patient is given to the algorithm. Our experiments show that the reinforcement learning agent is able to compete with and sometimes outperform state-of-the-art model predictive control in blood glucose regulation.

Highlights

  • Type 1 Diabetes Mellitus (T1DM) is a metabolic disease caused by the autoimmune destruction of insulin-producing beta cells in the pancreas [1]

  • Trust-region policy optimization (TRPO) is an algorithm that is based on the fact that if the policy gradient update is constrained by the total variation divergence, DTV (π1, π2 ) = max|π1 (·|s) − π2 (·|s)|, s∈S

  • Random skipped boluses: When it comes to the results using the extended action space TRPOe, we found that the results using 100 policy gradient iterations are inferior to the other results

Read more

Summary

Introduction

Type 1 Diabetes Mellitus (T1DM) is a metabolic disease caused by the autoimmune destruction of insulin-producing beta cells in the pancreas [1]. CSII treatment is a different strategy where the patient instead has an insulin pump that continuously infuses insulin The pump delivers both basal and bolus doses, where the basal rate consists of regularly infused short-acting insulin doses, while the boluses are activated by the user together with meal intakes and to account for hyperglycemia. With the improvement of modern treatment equipment, the combination of an insulin pump and CGM invites the addition of a third element, namely a control algorithm to substitute the operation of beta cells in the healthy pancreas. These three elements constitute the artificial pancreas [8,9]. Performance is measured through time-in-range (time spent on healthy blood glucose levels), time in hypo-/hyperglycemia, as well as blood glucose level plots for visual inspection

Related Work
Reinforcement Learning
Policy Gradient Methods
Parameterized Policies
Model Predictive Control
In-Silico Simulation
Simulator
Experiment Setup
Results
Virtual Population Experiment
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call