Continuous action deep reinforcement learning for propofol dosing during general anesthesia
PurposeAnesthesiologists simultaneously manage several aspects of patient care during general anesthesia. Automating administration of hypnotic agents could enable more precise control of a patient's level of unconsciousness and enable anesthesiologists to focus on the most critical aspects of patient care. Reinforcement learning (RL) algorithms can be used to fit a mapping from patient state to a medication regimen. These algorithms can learn complex control policies that, when paired with modern techniques for promoting model interpretability, offer a promising approach for developing a clinically viable system for automated anesthestic drug delivery. MethodsWe expand on our prior work applying deep RL to automated anesthetic dosing by now using a continuous-action model based on the actor-critic RL paradigm. The proposed RL agent is composed of a policy network that maps observed anesthetic states to a continuous probability density over propofol-infusion rates and a value network that estimates the favorability of observed states. We train and test three versions of the RL agent using varied reward functions. The agent is trained using simulated pharmacokinetic/pharmacodynamic models with randomized parameters to ensure robustness to patient variability. The model is tested on simulations and retrospectively on nine general anesthesia cases collected in the operating room. We utilize Shapley additive explanations to gain an understanding of the factors with the greatest influence over the agent's decision-making. ResultsThe deep RL agent significantly outperformed a proportional-integral-derivative controller (median episode median absolute performance error 1.9% ± 1.8 and 3.1% ± 1.1). The model that was rewarded for minimizing total doses performed the best across simulated patient demographics (median episode median performance error 1.1% ± 0.5). When run on real-world clinical datasets, the agent recommended doses that were consistent with those administered by the anesthesiologist. ConclusionsThe proposed approach marks the first fully continuous deep RL algorithm for automating anesthestic drug dosing. The reward function used by the RL training algorithm can be flexibly designed for desirable practices (e.g. use less anesthetic) and bolstered performances. Through careful analysis of the learned policies, techniques for interpreting dosing decisions, and testing on clinical data, we confirm that the agent's anesthetic dosing is consistent with our understanding of best-practices in anesthesia care.
105
- 10.1093/oxfordjournals.bja.a013378
- Jan 1, 2000
- British Journal of Anaesthesia
369
- 10.1213/ane.0000000000003668
- Sep 24, 2018
- Anesthesia and Analgesia
37
- 10.1016/j.mbs.2019.01.012
- Feb 5, 2019
- Mathematical Biosciences
87
- 10.1213/ane.0000000000000769
- Jan 1, 2016
- Anesthesia & Analgesia
1117
- 10.1097/00000542-199805000-00006
- May 1, 1998
- Anesthesiology
1474
- 10.1038/s41551-018-0304-0
- Oct 1, 2018
- Nature biomedical engineering
18
- 10.1016/j.cmpb.2020.105783
- Oct 5, 2020
- Computer Methods and Programs in Biomedicine
481
- 10.1146/annurev-neuro-060909-153200
- Jul 21, 2011
- Annual Review of Neuroscience
166
- 10.1109/tbme.2006.870255
- Apr 1, 2006
- IEEE Transactions on Biomedical Engineering
1449
- 10.1097/00000542-199810000-00023
- Oct 1, 1998
- Anesthesiology
- Book Chapter
- 10.1007/978-3-031-62814-6_16
- Jan 1, 2024
Optimized Intelligent PID Controller for Propofol Dosing in General Anesthesia Using Coati Optimization Algorithm
- Research Article
- 10.1097/aco.0000000000001529
- May 26, 2025
- Current Opinion in Anaesthesiology
Purpose of reviewTo summarize recent developments in the understanding of the pharmacology of the hypnotic and opioid drugs, with relevance to target-controlled infusions and newer pharmacokinetic models.Recent findingsGeneral-purpose models have been developed for propofol, remifentanil, and dexmedetomidine, suitable for use in a wide variety of patients, but still not universally applicable. A validation study of the predictive performance of the Eleveld propofol model showed reasonable performance in children, healthy adults, and obese adults but poorer performance in elderly patients. Observational studies show that complications during total intravenous anesthesia often arise from omission of basic safety checks and inadequate knowledge, rather than model misspecification. Specifically, there is a lack of understanding of the influence of the clinical situation on the pharmacodynamics of hypnotic drugs. Artificial intelligence is likely to produce useful drug infusion rate advisory systems, or even closed-loop control systems that could potentially provide better patient-individualized titration of anesthetic drugs.SummaryFurther efforts to develop new models are unlikely to be clinically beneficial. Efforts should rather be made to ensure better education and a better appreciation of variability in pharmacodynamics and the need for better ways of tailoring drug doses to individual patient needs.
- Research Article
- 10.1016/j.anclin.2025.05.009
- Sep 1, 2025
- Anesthesiology clinics
Artificial Intelligence in Perioperative Medication-Related Clinical Decision Support.
- Research Article
3
- 10.3390/curroncol31050207
- May 11, 2024
- Current Oncology
This narrative review explores the utilization of machine learning (ML) and artificial intelligence (AI) models to enhance perioperative cancer care. ML and AI models offer significant potential to improve perioperative cancer care by predicting outcomes and supporting clinical decision-making. Tailored for perioperative professionals including anesthesiologists, surgeons, critical care physicians, nurse anesthetists, and perioperative nurses, this review provides a comprehensive framework for the integration of ML and AI models to enhance patient care delivery throughout the perioperative continuum.
- Research Article
7
- 10.5812/jcma-145369
- Feb 3, 2024
- Journal of Cellular & Molecular Anesthesia
: Anesthesia is the process of inducing and experiencing various conditions, such as painlessness, immobility, and amnesia, to facilitate surgeries and other medical procedures. During the administration of anesthesia, anesthesiologists face critical decision-making moments, considering the significance of the procedure and potential complications resulting from anesthesia-related choices. In recent years, artificial intelligence (AI) has emerged as a supportive tool for anesthesia decisions, given its potential to assist with control and management tasks. This study aims to conduct a comprehensive review of articles on the intersection of AI and anesthesia. A review was conducted by searching PubMed for peer-reviewed articles published between 2020 and early 2022, using keywords related to anesthesia and AI. The articles were categorized into nine distinct groups: “Depth of anesthesia", “Control of anesthesia delivery", “Control of mechanical ventilation and weaning", “Event prediction", “Ultrasound guidance", “Pain management", “Operating room logistic", “Monitoring", and “Neuro-critical care". Four reviewers meticulously examined the selected articles to extract relevant information. The studies within each category were reviewed by considering items such as the purpose and type of anesthesia, AI algorithms, dataset, data accessibility, and evaluation criteria. To enhance clarity, each category was analyzed with a higher resolution than previous review articles, providing readers with key points, limitations, and potential areas for future research to facilitate a better understanding of each concept. The advancements in AI techniques hold promise in significantly enhancing anesthesia practices and improving the overall experience for anesthesiologists.
- Book Chapter
- 10.1007/978-981-97-0376-0_31
- Jan 1, 2024
Enhancing Safety During Surgical Procedures with Computer Vision, Artificial Intelligence, and Natural Language Processing
- Conference Article
2
- 10.1109/icpads60453.2023.00234
- Dec 17, 2023
OOCL-DDQN: Online Evaluation and Offline Training-Based Clipped Double DQN for Automated Anesthesia Control
- Research Article
28
- 10.1038/s41598-021-03112-2
- Dec 1, 2021
- Scientific reports
Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using Shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.
- Research Article
- 10.1016/j.cmpb.2025.108901
- Sep 1, 2025
- Computer methods and programs in biomedicine
AReS: A patient simulator to facilitate testing of automated anesthesia.
- Supplementary Content
22
- 10.4097/kja.22157
- Mar 29, 2022
- Korean Journal of Anesthesiology
Recent advancements in artificial intelligence (AI) techniques have enabled the development of accurate prediction models using clinical big data. AI models for perioperative risk stratification, intraoperative event prediction, biosignal analyses, and intensive care medicine have been developed in the field of perioperative medicine. Some of these models have been validated using external datasets and randomized controlled trials. Once these models are implemented in electronic health record systems or software medical devices, they could help anesthesiologists improve clinical outcomes by accurately predicting complications and suggesting optimal treatment strategies in real-time. This review provides an overview of the AI techniques used in perioperative medicine and a summary of the studies that have been published using these techniques. Understanding these techniques will aid in their appropriate application in clinical practice.
- Research Article
- 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
- Dec 15, 2019
Sample effficient deep reinforcement learning for control
- Research Article
90
- 10.1109/access.2020.2970433
- Jan 1, 2020
- IEEE Access
Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.
- Single Book
34
- 10.1007/978-981-19-0638-1
- Jan 1, 2022
Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to solve difficult problems. They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls. In some applications they have even become better than the best humans, such as in Atari, Go, poker and StarCraft. The way in which deep reinforcement learning explores complex environments reminds us of how children learn, by playfully trying out things, getting feedback, and trying again. The computer seems to truly possess aspects of human learning; this goes to the heart of the dream of artificial intelligence. The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject. The aim of this book is to provide a comprehensive overview of the field of deep reinforcement learning. The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges. We assume an undergraduate-level of understanding of computer science and artificial intelligence; the programming language of this book is Python. We describe the foundations, the algorithms and the applications of deep reinforcement learning. We cover the established model-free and model-based methods that form the basis of the field. Developments go quickly, and we also cover advanced topics: deep multi-agent reinforcement learning, deep hierarchical reinforcement learning, and deep meta learning.
- Conference Article
- 10.46720/f2021-acm-108
- Sep 30, 2021
As autonomous vehicles begin to drive on the road, rational decision making is essential for driving safety and efficiency. The decision-making of autonomous vehicles is a difficult problem since it depends on the surrounding dynamic environment constraints and its own motion constraints. As the result of the combination of deep learning (DL) and reinforcement learning (RL), deep reinforcement learning (DRL) integrates DL's strong understanding of perception problems such as visual and semantic text, as well as the decision-making ability of RL. Hence, DRL can be used to solve complex problems in real scenarios. However, as an end-to-end method, DRL is inefficient and the final result tend to be poorly robust. Considering the usefulness of existing domain knowledge for autonomous vehicle decision-making, this paper uses domain knowledge to establish behavioral rules and combine rule-based behavior strategies with DRL methods, so that we can achieve efficient training of autonomous vehicle decision-making models and ensure the vehicle to chooses safe actions under unknown circumstances. First, the continuous decision-making problem of autonomous vehicles is modeled as a Markov decision process (MDP). Taking into account the influence of unknown intentions of other road vehicles on self-driving decisions, a recognition model of the behavioral intentions of other vehicles was established. Then, the linear dynamic model of the conventional vehicle is used to establish the relationship between the vehicle decision-making behavior and the motion trajectory. Finally, by designing the reward function of the MDP, we use a combination of RL and behavior rules-based controller, the expected driving behavior of the autonomous vehicle is obtained. In this paper, the simulation environment of scenes of intersections in urban roads and highways is established, and each situation is formalized as an RL problem. Meanwhile, a large number of numerical simulations were carried out, and the comparison of our method and the end-to-end form of DRL technology were discussed. "Due to its robust operation and high performance during bad weather conditions and overnight as well as the ability of using the Doppler Effect to measure directly the velocity of objects, the radar sensor is used in many application fields. Especially in automotive many radar sensors are used for the perception of the environment to increase the safety of the traffic. To increase the security level especially for vulnerable road users (VRU’s) like pedestrians or cyclists, radar sensors are used in driver assistance systems. Radar sensors are also used in the infrastructure, e.g. a commercial application is the detection of cars and pedestrians to manage traffic lights. Furthermore, radar sensors installed in the infrastructure are used in research projects for safeguarding future autonomous traffic. The object recognition and accuracy of radar-based sensing in the infrastructure can be increased by cooperating radar systems, which consist out of several sensors. This paper focus on the data fusion method of two radar sensors to increase the performance of detection and localization. For data fusion the high level cluster data of the two radar sensors are used as input data in a neuronal net (NN) structure. The results are compared to the localization obtained by using only a single radar sensor operating with an ordinary tracking algorithm. First, different models for chosen region of interests (ROI) and operating mode of cooperative sensors are developed and the data structure is discussed. In addition, the data are preprocessed with a coordinate transformation and time synchronization for both sensors, as well as the noise filtering to reduce the amount of clusters for the algorithm. Furthermore, three NN structures (CNN, DNN and LSTM) for static + dynamic objects and only dynamic objects are created, trained and discussed. Also, based on the results further improvements for the NN performance will be discussed."
- Book Chapter
10
- 10.1007/978-3-030-59854-9_2
- Jan 1, 2020
Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from different deep (reinforcement) learning approaches. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model. We find that imitation learning yields agents that follow more risky paths. In contrast, the decisions of deep reinforcement learning are more foresighted, i.e., avoid states in which fatal decisions are more likely. Our evaluations show that for this sequential decision making problem, deep reinforcement learning performs best in many aspects even though for imitation learning optimal decisions are considered.KeywordsDeep reinforcement learningImitation learning
- Research Article
132
- 10.1007/s10462-021-10061-9
- Sep 29, 2021
- Artificial Intelligence Review
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.
- Research Article
213
- 10.1097/00000542-200201000-00017
- Jan 1, 2002
- Anesthesiology
The Bispectral Index (BIS) is an electroencephalogram-derived measure of anesthetic depth. A closed-loop anesthesia system was built using BIS as the control variable, a proportional-integral-differential control algorithm, and a propofol target-controlled infusion system as the control actuator. Closed-loop performance was assessed in 10 adult patients. Ten adult patients scheduled to undergo elective hip or knee surgery were enrolled. An epidural cannula was inserted, and 0.5% bupivacaine was used to provide anesthesia to T8 before general anesthesia was induced using the propofol target-controlled infusion system under manual control. After the start of surgery, when anesthesia was clinically adequate, automatic control of anesthesia was commenced using the BIS as the control variable. Adequacy of anesthesia during closed-loop control was assessed clinically and by calculating the median performance error, the median absolute performance error, and the mean offset of the control variable. The median performance error and the median absolute performance error were 2.2 and 8.0%, respectively. Mean offset of the BIS from the set point was 0.9. Cardiovascular parameters were stable during closed-loop control. Operating conditions were adequate in all patients but one, who began moving after 45 min of stable anesthesia. No patients reported awareness or recall of intraoperative events. In three patients, there was oscillation of the measured BIS around the set point. The system was able to provide clinically adequate anesthesia in 9 of 10 patients. Further studies are required to determine whether control performance can be improved by alterations to the gain factors or by using an effect site-targeted, target-controlled infusion propofol system.
- Research Article
6
- 10.1360/n972016-00741
- Sep 20, 2016
- Chinese Science Bulletin
Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 “What are the limits of learning by machines?”. The annotation “Computers can already beat the world’s best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine”. In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM’s Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepMind’s AlphaGo sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaGo, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepMind proposed deep Q-network (DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcement learning. They continue to improve and published an improved version of DQN on Nature in 2015, which has aroused widespread concern. In order to break through the limits of learning by machines, cognitive machine learning is proposed, which is the combination of machine learning and brain cognition, so that the machine intelligence is constantly evolving, and gradually reaches the human level of artificial intelligence. A cognitive model entitled Consciousness And Memory (CAM) is proposed by author, which consists of memory, consciousness, high-level cognitive functions, perception and motor. High-level cognitive functions of the brain include learning, language, thinking, decision making, emotion, and so on. Learning is a course to accept the stimulus through the nervous system and obtain new behavior, habits and accumulation experience. According to the current research progress of brain science and cognitive science, cognitive machine learning may be interested in learning emergence, procedural memory knowledge learning, learning evolution and so on. For intelligence, so-called evolution is refers to the learning of learning and the structure also follows the change. It is important to record the learning result by structure changing and improve the learning method.
- Research Article
24
- 10.1016/j.tics.2020.09.002
- Oct 8, 2020
- Trends in Cognitive Sciences
Artificial Intelligence and the Common Sense of Animals.
- Research Article
18
- 10.1088/1361-6560/ac9cb3
- Nov 11, 2022
- Physics in Medicine & Biology
Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.
- Book Chapter
4
- 10.4018/979-8-3693-0876-9.ch016
- Oct 18, 2023
Deep reinforcement learning (DRL) is one of the emerging areas of machine learning which focuses on maximized rewards. DRL is a type of machine learning that combines reinforcement learning and deep learning. It uses a series of algorithms to enable an agent to learn how to make decisions in a complex environment. DRL is a subset of artificial intelligence that focuses on making decisions based on the environment and the rewards associated with each action.The goal of DRL is to maximize the long-term reward of an agent. In order to do this, the agent must use a combination of deep learning, reinforcement learning and other AI techniques to learn which actions will lead to the highest reward. DRL is used to solve a variety of problems, from playing video games to controlling robots. It is also used in autonomous driving and robotics, as well as for financial trading. DRL is a powerful tool for solving complex problems and has been used in a variety of research projects. DRL has the potential to revolutionize the way we interact with machines and the environment.
- Conference Article
4
- 10.1109/agents.2019.8929214
- Oct 1, 2019
Passenger comfort and their safety are pre-requisites to realizing autonomous driving vehicles. Herein, we define “comfortable driving” by considering “comfortability”, with which less physical and mental burden for passengers. Deep reinforcement learning, which has several applications in the autonomous driving domain, is an effective approach to achieve the comfortable driving. Generally, reward function in deep reinforcement learning is expressed quantitatively. However, because obtaining a quantitative expression for comfortable driving is difficult, there is no guarantee that a reward function can satisfy “comfortable driving” conditions. Therefore, we propose an approach to identify reward function that can realize comfortable driving, using LogReg-IRL, a deep inverse reinforcement learning method in linearly solvable Markov decision process. With the constraint that the maximum lateral acceleration does not exceed a certain threshold value, we could experimentally achieve “comfortable driving”. Additionally, by calculating the gradient for the state input of the state-dependent reward function, we could analyze important states.
- Book Chapter
1
- 10.1007/978-3-030-75490-7_2
- Jan 1, 2021
Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.
- Research Article
27
- 10.1213/ane.0b013e318202cb7c
- Dec 14, 2010
- Anesthesia & Analgesia
Research has demonstrated the efficacy of closed-loop control of anesthesia using bispectral index (BIS) as the controlled variable. Model-based and proportional-integral-derivative (PID) controllers outperform manual control. We investigated the application of reinforcement learning (RL), an intelligent systems control method, to closed-loop BIS-guided, propofol-induced hypnosis in simulated intraoperative patients. We also compared the performance of the RL agent against that of a conventional PID controller. The RL and PID controllers were evaluated during propofol induction and maintenance of hypnosis. The patient-hypnotic episodes were designed to challenge both controllers with varying degrees of interindividual variation and noxious surgical stimulation. Each controller was tested in 1000 simulated patients, and control performance was assessed by calculating the median performance error (MDPE), median absolute performance error (MDAPE), Wobble, and Divergence for each controller group. A separate analysis was performed for the induction and maintenance phases of hypnosis. During maintenance, RL control demonstrated an MDPE of -1% and an MDAPE of 3.75%, with 80% of the time at BIS(target) ± 5. The PID controller yielded a MDPE of -8.5% and an MDAPE of 8.6%, with 57% of the time at BIS(target) ± 5. In comparison, the MDAPE in the worst-controlled patient of the RL group was observed to be almost half that of the worst-controlled patient in the PID group. When compared with the PID controller, RL control resulted in slower induction but less overshoot and faster attainment of steady state. No difference in interindividual patient variation and noxious destabilizing challenge on control performance was observed between the 2 patient groups.
- Book Chapter
- 10.3233/faia250817
- Oct 21, 2025
Deep reinforcement learning (RL) models, despite their efficiency in learning an optimal policy in static environments, easily loses previously learned knowledge (i.e., catastrophic forgetting). It leads RL models to poor performance in continual reinforcement learning (CRL) scenarios. To address this, we present an arbitration control mechanism over an ensemble of RL agents. It is motivated by and closely aligned with how humans make decisions in a CRL context using an arbitration control of multiple RL agents in parallel as observed in the prefrontal cortex. We integrated two key ideas into our model: (1) an ensemble of RLs (i.e., DQN variants) explicitly trained to have diverse value functions and (2) an arbitration control that prioritizes agents with higher reliability (i.e., less error) in recent trials. We propose a framework for CRL, an Arbitration Control for an Ensemble of Diversified DQN variants (ACED-DQN). We demonstrate significant performance improvements in both static and continual environments, supported by empirical evidence showing the effectiveness of arbitration control over diversified DQNs during training. In this work, we introduced a framework that enables RL agents to continuously learn, with inspiration from the human brain.
- New
- Research Article
- 10.1016/j.artmed.2025.103246
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103247
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103251
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103243
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103245
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103237
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103242
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103252
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103239
- Nov 1, 2025
- Artificial intelligence in medicine
- New
- Research Article
- 10.1016/j.artmed.2025.103241
- Nov 1, 2025
- Artificial intelligence in medicine
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.