Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes.

Taiyu Zhu,Pantelis Georgiou,Kezhi Li

doi:10.1109/jbhi.2023.3303367

Abstract

Recent advancements in hybrid closed-loop systems, also known as the artificial pancreas (AP), have been shown to optimize glucose control and reduce the self-management burdens for people living with type 1 diabetes (T1D). AP systems can adjust the basal infusion rates of insulin pumps, facilitated by real-time communication with continuous glucose monitoring. Deep reinforcement learning (DRL) has introduced new paradigms of basal insulin control algorithms. However, all the existing DRL-based AP controllers require extensive random online interactions between the agent and environment. While this can be validated in T1D simulators, it becomes impractical in real-world clinical settings. To this end, we propose an offline DRL framework that can develop and validate models for basal insulin control entirely offline. It comprises a DRL model based on the twin delayed deep deterministic policy gradient and behavior cloning, as well as off-policy evaluation (OPE) using fitted Q evaluation. We evaluated the proposed framework on an in silico dataset generated by the UVA/Padova T1D simulator, and the OhioT1DM dataset, a real clinical dataset. The performance on the in silico dataset shows that the offline DRL algorithm significantly increased time in range while reducing time below range and time above range for both adult and adolescent groups. Then, we used the OPE to estimate model performance on the clinical dataset, where a notable increase in policy values was observed for each subject. The results demonstrate that the proposed framework is a viable and safe method for improving personalized basal insulin control in T1D.

Full Text