Continuous action deep reinforcement learning for propofol dosing during general anesthesia

  • Abstract
  • Literature Map
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

PurposeAnesthesiologists simultaneously manage several aspects of patient care during general anesthesia. Automating administration of hypnotic agents could enable more precise control of a patient's level of unconsciousness and enable anesthesiologists to focus on the most critical aspects of patient care. Reinforcement learning (RL) algorithms can be used to fit a mapping from patient state to a medication regimen. These algorithms can learn complex control policies that, when paired with modern techniques for promoting model interpretability, offer a promising approach for developing a clinically viable system for automated anesthestic drug delivery. MethodsWe expand on our prior work applying deep RL to automated anesthetic dosing by now using a continuous-action model based on the actor-critic RL paradigm. The proposed RL agent is composed of a policy network that maps observed anesthetic states to a continuous probability density over propofol-infusion rates and a value network that estimates the favorability of observed states. We train and test three versions of the RL agent using varied reward functions. The agent is trained using simulated pharmacokinetic/pharmacodynamic models with randomized parameters to ensure robustness to patient variability. The model is tested on simulations and retrospectively on nine general anesthesia cases collected in the operating room. We utilize Shapley additive explanations to gain an understanding of the factors with the greatest influence over the agent's decision-making. ResultsThe deep RL agent significantly outperformed a proportional-integral-derivative controller (median episode median absolute performance error 1.9% ± 1.8 and 3.1% ± 1.1). The model that was rewarded for minimizing total doses performed the best across simulated patient demographics (median episode median performance error 1.1% ± 0.5). When run on real-world clinical datasets, the agent recommended doses that were consistent with those administered by the anesthesiologist. ConclusionsThe proposed approach marks the first fully continuous deep RL algorithm for automating anesthestic drug dosing. The reward function used by the RL training algorithm can be flexibly designed for desirable practices (e.g. use less anesthetic) and bolstered performances. Through careful analysis of the learned policies, techniques for interpreting dosing decisions, and testing on clinical data, we confirm that the agent's anesthetic dosing is consistent with our understanding of best-practices in anesthesia care.

ReferencesShowing 10 of 33 papers
  • Open Access Icon
  • Cite Count Icon 105
  • 10.1093/oxfordjournals.bja.a013378
Increasing isoflurane concentration may cause paradoxical increases in the EEG bispectral index in surgical patients
  • Jan 1, 2000
  • British Journal of Anaesthesia
  • O Detsch + 4 more

  • Open Access Icon
  • Cite Count Icon 369
  • 10.1213/ane.0000000000003668
Multimodal General Anesthesia: Theory and Practice
  • Sep 24, 2018
  • Anesthesia and Analgesia
  • Emery N Brown + 2 more

  • Cite Count Icon 37
  • 10.1016/j.mbs.2019.01.012
Optimal adaptive control of drug dosing using integral reinforcement learning
  • Feb 5, 2019
  • Mathematical Biosciences
  • Regina Padmanabhan + 2 more

  • Cite Count Icon 87
  • 10.1213/ane.0000000000000769
A Multicenter Evaluation of a Closed-Loop Anesthesia Delivery System: A Randomized Controlled Trial.
  • Jan 1, 2016
  • Anesthesia & Analgesia
  • Goverdhan D Puri + 13 more

  • Open Access Icon
  • Cite Count Icon 1117
  • 10.1097/00000542-199805000-00006
The influence of method of administration and covariates on the pharmacokinetics of propofol in adult volunteers.
  • May 1, 1998
  • Anesthesiology
  • Thomas W Schnider + 6 more

  • Open Access Icon
  • Cite Count Icon 1474
  • 10.1038/s41551-018-0304-0
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery.
  • Oct 1, 2018
  • Nature biomedical engineering
  • Scott M Lundberg + 10 more

  • Open Access Icon
  • Cite Count Icon 18
  • 10.1016/j.cmpb.2020.105783
Robust PID control of propofol anaesthesia: Uncertainty limits performance, not PID structure
  • Oct 5, 2020
  • Computer Methods and Programs in Biomedicine
  • Jose M Gonzalez-Cava + 6 more

  • Open Access Icon
  • Cite Count Icon 481
  • 10.1146/annurev-neuro-060909-153200
General anesthesia and altered states of arousal: a systems neuroscience analysis.
  • Jul 21, 2011
  • Annual Review of Neuroscience
  • Emery N Brown + 2 more

  • Cite Count Icon 166
  • 10.1109/tbme.2006.870255
Quantifying Cortical Activity During General Anesthesia Using Wavelet Analysis
  • Apr 1, 2006
  • IEEE Transactions on Biomedical Engineering
  • T Zikov + 4 more

  • Open Access Icon
  • Cite Count Icon 1449
  • 10.1097/00000542-199810000-00023
A primer for EEG signal processing in anesthesia.
  • Oct 1, 1998
  • Anesthesiology
  • Ira J Rampil

CitationsShowing 10 of 38 papers
  • Book Chapter
  • 10.1007/978-3-031-62814-6_16
Optimized Intelligent PID Controller for Propofol Dosing in General Anesthesia Using Coati Optimization Algorithm
  • Jan 1, 2024
  • Ammar T Namel + 1 more

Optimized Intelligent PID Controller for Propofol Dosing in General Anesthesia Using Coati Optimization Algorithm

  • Research Article
  • 10.1097/aco.0000000000001529
The future of target-controlled infusion and new pharmacokinetic models
  • May 26, 2025
  • Current Opinion in Anaesthesiology
  • Anthony R Absalom + 1 more

Purpose of reviewTo summarize recent developments in the understanding of the pharmacology of the hypnotic and opioid drugs, with relevance to target-controlled infusions and newer pharmacokinetic models.Recent findingsGeneral-purpose models have been developed for propofol, remifentanil, and dexmedetomidine, suitable for use in a wide variety of patients, but still not universally applicable. A validation study of the predictive performance of the Eleveld propofol model showed reasonable performance in children, healthy adults, and obese adults but poorer performance in elderly patients. Observational studies show that complications during total intravenous anesthesia often arise from omission of basic safety checks and inadequate knowledge, rather than model misspecification. Specifically, there is a lack of understanding of the influence of the clinical situation on the pharmacodynamics of hypnotic drugs. Artificial intelligence is likely to produce useful drug infusion rate advisory systems, or even closed-loop control systems that could potentially provide better patient-individualized titration of anesthetic drugs.SummaryFurther efforts to develop new models are unlikely to be clinically beneficial. Efforts should rather be made to ensure better education and a better appreciation of variability in pharmacodynamics and the need for better ways of tailoring drug doses to individual patient needs.

  • Research Article
  • 10.1016/j.anclin.2025.05.009
Artificial Intelligence in Perioperative Medication-Related Clinical Decision Support.
  • Sep 1, 2025
  • Anesthesiology clinics
  • Maya Patel + 1 more

Artificial Intelligence in Perioperative Medication-Related Clinical Decision Support.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/curroncol31050207
Application of Machine Learning in Predicting Perioperative Outcomes in Patients with Cancer: A Narrative Review for Clinicians.
  • May 11, 2024
  • Current Oncology
  • Garry Brydges + 2 more

This narrative review explores the utilization of machine learning (ML) and artificial intelligence (AI) models to enhance perioperative cancer care. ML and AI models offer significant potential to improve perioperative cancer care by predicting outcomes and supporting clinical decision-making. Tailored for perioperative professionals including anesthesiologists, surgeons, critical care physicians, nurse anesthetists, and perioperative nurses, this review provides a comprehensive framework for the integration of ML and AI models to enhance patient care delivery throughout the perioperative continuum.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.5812/jcma-145369
Machine Learning-Guided Anesthesiology: A Review of Recent Advances and Clinical Applications
  • Feb 3, 2024
  • Journal of Cellular & Molecular Anesthesia
  • Sana Hashemi + 5 more

: Anesthesia is the process of inducing and experiencing various conditions, such as painlessness, immobility, and amnesia, to facilitate surgeries and other medical procedures. During the administration of anesthesia, anesthesiologists face critical decision-making moments, considering the significance of the procedure and potential complications resulting from anesthesia-related choices. In recent years, artificial intelligence (AI) has emerged as a supportive tool for anesthesia decisions, given its potential to assist with control and management tasks. This study aims to conduct a comprehensive review of articles on the intersection of AI and anesthesia. A review was conducted by searching PubMed for peer-reviewed articles published between 2020 and early 2022, using keywords related to anesthesia and AI. The articles were categorized into nine distinct groups: “Depth of anesthesia", “Control of anesthesia delivery", “Control of mechanical ventilation and weaning", “Event prediction", “Ultrasound guidance", “Pain management", “Operating room logistic", “Monitoring", and “Neuro-critical care". Four reviewers meticulously examined the selected articles to extract relevant information. The studies within each category were reviewed by considering items such as the purpose and type of anesthesia, AI algorithms, dataset, data accessibility, and evaluation criteria. To enhance clarity, each category was analyzed with a higher resolution than previous review articles, providing readers with key points, limitations, and potential areas for future research to facilitate a better understanding of each concept. The advancements in AI techniques hold promise in significantly enhancing anesthesia practices and improving the overall experience for anesthesiologists.

  • Book Chapter
  • 10.1007/978-981-97-0376-0_31
Enhancing Safety During Surgical Procedures with Computer Vision, Artificial Intelligence, and Natural Language Processing
  • Jan 1, 2024
  • Okeke Stephen + 1 more

Enhancing Safety During Surgical Procedures with Computer Vision, Artificial Intelligence, and Natural Language Processing

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icpads60453.2023.00234
OOCL-DDQN: Online Evaluation and Offline Training-Based Clipped Double DQN for Automated Anesthesia Control
  • Dec 17, 2023
  • Huijie Li + 3 more

OOCL-DDQN: Online Evaluation and Offline Training-Based Clipped Double DQN for Automated Anesthesia Control

  • Open Access Icon
  • Research Article
  • Cite Count Icon 28
  • 10.1038/s41598-021-03112-2
Predicting anesthetic infusion events using machine learning
  • Dec 1, 2021
  • Scientific reports
  • Naoki Miyaguchi + 4 more

Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using Shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.

  • Research Article
  • 10.1016/j.cmpb.2025.108901
AReS: A patient simulator to facilitate testing of automated anesthesia.
  • Sep 1, 2025
  • Computer methods and programs in biomedicine
  • Sara Hosseinirad + 5 more

AReS: A patient simulator to facilitate testing of automated anesthesia.

  • Open Access Icon
  • Supplementary Content
  • Cite Count Icon 22
  • 10.4097/kja.22157
Artificial intelligence in perioperative medicine: a narrative review
  • Mar 29, 2022
  • Korean Journal of Anesthesiology
  • Hyun-Kyu Yoon + 3 more

Recent advancements in artificial intelligence (AI) techniques have enabled the development of accurate prediction models using clinical big data. AI models for perioperative risk stratification, intraoperative event prediction, biosignal analyses, and intensive care medicine have been developed in the field of perioperative medicine. Some of these models have been validated using external datasets and randomized controlled trials. Once these models are implemented in electronic health record systems or software medical devices, they could help anesthesiologists improve clinical outcomes by accurately predicting complications and suggesting optimal treatment strategies in real-time. This review provides an overview of the AI techniques used in perioperative medicine and a summary of the studies that have been published using these techniques. Understanding these techniques will aid in their appropriate application in clinical practice.

Similar Papers
  • Research Article
  • 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
Sample effficient deep reinforcement learning for control
  • Dec 15, 2019
  • Tim De Bruin

Sample effficient deep reinforcement learning for control

  • Research Article
  • Cite Count Icon 90
  • 10.1109/access.2020.2970433
Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
  • Jan 1, 2020
  • IEEE Access
  • Qilei Zhang + 4 more

Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.

  • Single Book
  • Cite Count Icon 34
  • 10.1007/978-981-19-0638-1
Deep Reinforcement Learning
  • Jan 1, 2022
  • Aske Plaat

Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to solve difficult problems. They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls. In some applications they have even become better than the best humans, such as in Atari, Go, poker and StarCraft. The way in which deep reinforcement learning explores complex environments reminds us of how children learn, by playfully trying out things, getting feedback, and trying again. The computer seems to truly possess aspects of human learning; this goes to the heart of the dream of artificial intelligence. The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject. The aim of this book is to provide a comprehensive overview of the field of deep reinforcement learning. The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges. We assume an undergraduate-level of understanding of computer science and artificial intelligence; the programming language of this book is Python. We describe the foundations, the algorithms and the applications of deep reinforcement learning. We cover the established model-free and model-based methods that form the basis of the field. Developments go quickly, and we also cover advanced topics: deep multi-agent reinforcement learning, deep hierarchical reinforcement learning, and deep meta learning.

  • Conference Article
  • 10.46720/f2021-acm-108
Autonomous Driving Decision-making Based on the Combination of Deep Reinforcement Learning and Rule-based Controller
  • Sep 30, 2021
  • Jinzhu Wang Jinzhu Wang + 3 more

As autonomous vehicles begin to drive on the road, rational decision making is essential for driving safety and efficiency. The decision-making of autonomous vehicles is a difficult problem since it depends on the surrounding dynamic environment constraints and its own motion constraints. As the result of the combination of deep learning (DL) and reinforcement learning (RL), deep reinforcement learning (DRL) integrates DL's strong understanding of perception problems such as visual and semantic text, as well as the decision-making ability of RL. Hence, DRL can be used to solve complex problems in real scenarios. However, as an end-to-end method, DRL is inefficient and the final result tend to be poorly robust. Considering the usefulness of existing domain knowledge for autonomous vehicle decision-making, this paper uses domain knowledge to establish behavioral rules and combine rule-based behavior strategies with DRL methods, so that we can achieve efficient training of autonomous vehicle decision-making models and ensure the vehicle to chooses safe actions under unknown circumstances. First, the continuous decision-making problem of autonomous vehicles is modeled as a Markov decision process (MDP). Taking into account the influence of unknown intentions of other road vehicles on self-driving decisions, a recognition model of the behavioral intentions of other vehicles was established. Then, the linear dynamic model of the conventional vehicle is used to establish the relationship between the vehicle decision-making behavior and the motion trajectory. Finally, by designing the reward function of the MDP, we use a combination of RL and behavior rules-based controller, the expected driving behavior of the autonomous vehicle is obtained. In this paper, the simulation environment of scenes of intersections in urban roads and highways is established, and each situation is formalized as an RL problem. Meanwhile, a large number of numerical simulations were carried out, and the comparison of our method and the end-to-end form of DRL technology were discussed. "Due to its robust operation and high performance during bad weather conditions and overnight as well as the ability of using the Doppler Effect to measure directly the velocity of objects, the radar sensor is used in many application fields. Especially in automotive many radar sensors are used for the perception of the environment to increase the safety of the traffic. To increase the security level especially for vulnerable road users (VRU’s) like pedestrians or cyclists, radar sensors are used in driver assistance systems. Radar sensors are also used in the infrastructure, e.g. a commercial application is the detection of cars and pedestrians to manage traffic lights. Furthermore, radar sensors installed in the infrastructure are used in research projects for safeguarding future autonomous traffic. The object recognition and accuracy of radar-based sensing in the infrastructure can be increased by cooperating radar systems, which consist out of several sensors. This paper focus on the data fusion method of two radar sensors to increase the performance of detection and localization. For data fusion the high level cluster data of the two radar sensors are used as input data in a neuronal net (NN) structure. The results are compared to the localization obtained by using only a single radar sensor operating with an ordinary tracking algorithm. First, different models for chosen region of interests (ROI) and operating mode of cooperative sensors are developed and the data structure is discussed. In addition, the data are preprocessed with a coordinate transformation and time synchronization for both sensors, as well as the noise filtering to reduce the amount of clusters for the algorithm. Furthermore, three NN structures (CNN, DNN and LSTM) for static + dynamic objects and only dynamic objects are created, trained and discussed. Also, based on the results further improvements for the NN performance will be discussed."

  • Book Chapter
  • Cite Count Icon 10
  • 10.1007/978-3-030-59854-9_2
Tracking the Race Between Deep Reinforcement Learning and Imitation Learning
  • Jan 1, 2020
  • Timo P Gros + 3 more

Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from different deep (reinforcement) learning approaches. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model. We find that imitation learning yields agents that follow more risky paths. In contrast, the decisions of deep reinforcement learning are more foresighted, i.e., avoid states in which fatal decisions are more likely. Our evaluations show that for this sequential decision making problem, deep reinforcement learning performs best in many aspects even though for imitation learning optimal decisions are considered.KeywordsDeep reinforcement learningImitation learning

  • Research Article
  • Cite Count Icon 132
  • 10.1007/s10462-021-10061-9
Deep reinforcement learning in computer vision: a comprehensive survey
  • Sep 29, 2021
  • Artificial Intelligence Review
  • Ngan Le + 4 more

Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.

  • Research Article
  • Cite Count Icon 213
  • 10.1097/00000542-200201000-00017
Closed-loop control of anesthesia using Bispectral index: performance assessment in patients undergoing major orthopedic surgery under combined general and regional anesthesia.
  • Jan 1, 2002
  • Anesthesiology
  • Anthony R Absalom + 2 more

The Bispectral Index (BIS) is an electroencephalogram-derived measure of anesthetic depth. A closed-loop anesthesia system was built using BIS as the control variable, a proportional-integral-differential control algorithm, and a propofol target-controlled infusion system as the control actuator. Closed-loop performance was assessed in 10 adult patients. Ten adult patients scheduled to undergo elective hip or knee surgery were enrolled. An epidural cannula was inserted, and 0.5% bupivacaine was used to provide anesthesia to T8 before general anesthesia was induced using the propofol target-controlled infusion system under manual control. After the start of surgery, when anesthesia was clinically adequate, automatic control of anesthesia was commenced using the BIS as the control variable. Adequacy of anesthesia during closed-loop control was assessed clinically and by calculating the median performance error, the median absolute performance error, and the mean offset of the control variable. The median performance error and the median absolute performance error were 2.2 and 8.0%, respectively. Mean offset of the BIS from the set point was 0.9. Cardiovascular parameters were stable during closed-loop control. Operating conditions were adequate in all patients but one, who began moving after 45 min of stable anesthesia. No patients reported awareness or recall of intraoperative events. In three patients, there was oscillation of the measured BIS around the set point. The system was able to provide clinically adequate anesthesia in 9 of 10 patients. Further studies are required to determine whether control performance can be improved by alterations to the gain factors or by using an effect site-targeted, target-controlled infusion propofol system.

  • Research Article
  • Cite Count Icon 6
  • 10.1360/n972016-00741
Break through the limits of learning by machines
  • Sep 20, 2016
  • Chinese Science Bulletin
  • Zhongzhi Shi

Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 “What are the limits of learning by machines?”. The annotation “Computers can already beat the world’s best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine”. In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM’s Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepMind’s AlphaGo sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaGo, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepMind proposed deep Q-network (DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcement learning. They continue to improve and published an improved version of DQN on Nature in 2015, which has aroused widespread concern. In order to break through the limits of learning by machines, cognitive machine learning is proposed, which is the combination of machine learning and brain cognition, so that the machine intelligence is constantly evolving, and gradually reaches the human level of artificial intelligence. A cognitive model entitled Consciousness And Memory (CAM) is proposed by author, which consists of memory, consciousness, high-level cognitive functions, perception and motor. High-level cognitive functions of the brain include learning, language, thinking, decision making, emotion, and so on. Learning is a course to accept the stimulus through the nervous system and obtain new behavior, habits and accumulation experience. According to the current research progress of brain science and cognitive science, cognitive machine learning may be interested in learning emergence, procedural memory knowledge learning, learning evolution and so on. For intelligence, so-called evolution is refers to the learning of learning and the structure also follows the change. It is important to record the learning result by structure changing and improve the learning method.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • Research Article
  • Cite Count Icon 18
  • 10.1088/1361-6560/ac9cb3
Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey
  • Nov 11, 2022
  • Physics in Medicine & Biology
  • Lanyu Xu + 2 more

Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.

  • Book Chapter
  • Cite Count Icon 4
  • 10.4018/979-8-3693-0876-9.ch016
Nitty-Gritty of Deep Reinforcement Learning for the Healthcare Sector
  • Oct 18, 2023
  • Vaishnavi Kumari + 5 more

Deep reinforcement learning (DRL) is one of the emerging areas of machine learning which focuses on maximized rewards. DRL is a type of machine learning that combines reinforcement learning and deep learning. It uses a series of algorithms to enable an agent to learn how to make decisions in a complex environment. DRL is a subset of artificial intelligence that focuses on making decisions based on the environment and the rewards associated with each action.The goal of DRL is to maximize the long-term reward of an agent. In order to do this, the agent must use a combination of deep learning, reinforcement learning and other AI techniques to learn which actions will lead to the highest reward. DRL is used to solve a variety of problems, from playing video games to controlling robots. It is also used in autonomous driving and robotics, as well as for financial trading. DRL is a powerful tool for solving complex problems and has been used in a variety of research projects. DRL has the potential to revolutionize the way we interact with machines and the environment.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/agents.2019.8929214
Comfortable Driving by using Deep Inverse Reinforcement Learning
  • Oct 1, 2019
  • Daiko Kishikawa + 1 more

Passenger comfort and their safety are pre-requisites to realizing autonomous driving vehicles. Herein, we define “comfortable driving” by considering “comfortability”, with which less physical and mental burden for passengers. Deep reinforcement learning, which has several applications in the autonomous driving domain, is an effective approach to achieve the comfortable driving. Generally, reward function in deep reinforcement learning is expressed quantitatively. However, because obtaining a quantitative expression for comfortable driving is difficult, there is no guarantee that a reward function can satisfy “comfortable driving” conditions. Therefore, we propose an approach to identify reward function that can realize comfortable driving, using LogReg-IRL, a deep inverse reinforcement learning method in linearly solvable Markov decision process. With the constraint that the maximum lateral acceleration does not exceed a certain threshold value, we could experimentally achieve “comfortable driving”. Additionally, by calculating the gradient for the state input of the state-dependent reward function, we could analyze important states.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-75490-7_2
Deep Reinforcement Learning: A New Frontier in Computer Vision Research
  • Jan 1, 2021
  • Sejuti Rahman + 3 more

Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.

  • Research Article
  • Cite Count Icon 27
  • 10.1213/ane.0b013e318202cb7c
Reinforcement Learning Versus Proportional–Integral–Derivative Control of Hypnosis in a Simulated Intraoperative Patient
  • Dec 14, 2010
  • Anesthesia & Analgesia
  • Brett L Moore + 2 more

Research has demonstrated the efficacy of closed-loop control of anesthesia using bispectral index (BIS) as the controlled variable. Model-based and proportional-integral-derivative (PID) controllers outperform manual control. We investigated the application of reinforcement learning (RL), an intelligent systems control method, to closed-loop BIS-guided, propofol-induced hypnosis in simulated intraoperative patients. We also compared the performance of the RL agent against that of a conventional PID controller. The RL and PID controllers were evaluated during propofol induction and maintenance of hypnosis. The patient-hypnotic episodes were designed to challenge both controllers with varying degrees of interindividual variation and noxious surgical stimulation. Each controller was tested in 1000 simulated patients, and control performance was assessed by calculating the median performance error (MDPE), median absolute performance error (MDAPE), Wobble, and Divergence for each controller group. A separate analysis was performed for the induction and maintenance phases of hypnosis. During maintenance, RL control demonstrated an MDPE of -1% and an MDAPE of 3.75%, with 80% of the time at BIS(target) ± 5. The PID controller yielded a MDPE of -8.5% and an MDAPE of 8.6%, with 57% of the time at BIS(target) ± 5. In comparison, the MDAPE in the worst-controlled patient of the RL group was observed to be almost half that of the worst-controlled patient in the PID group. When compared with the PID controller, RL control resulted in slower induction but less overshoot and faster attainment of steady state. No difference in interindividual patient variation and noxious destabilizing challenge on control performance was observed between the 2 patient groups.

  • Book Chapter
  • 10.3233/faia250817
An Arbitration Control for an Ensemble of Diversified DQN Variants in Continual Reinforcement Learning
  • Oct 21, 2025
  • Wonseo Jang + 1 more

Deep reinforcement learning (RL) models, despite their efficiency in learning an optimal policy in static environments, easily loses previously learned knowledge (i.e., catastrophic forgetting). It leads RL models to poor performance in continual reinforcement learning (CRL) scenarios. To address this, we present an arbitration control mechanism over an ensemble of RL agents. It is motivated by and closely aligned with how humans make decisions in a CRL context using an arbitration control of multiple RL agents in parallel as observed in the prefrontal cortex. We integrated two key ideas into our model: (1) an ensemble of RLs (i.e., DQN variants) explicitly trained to have diverse value functions and (2) an arbitration control that prioritizes agents with higher reliability (i.e., less error) in recent trials. We propose a framework for CRL, an Arbitration Control for an Ensemble of Diversified DQN variants (ACED-DQN). We demonstrate significant performance improvements in both static and continual environments, supported by empirical evidence showing the effectiveness of arbitration control over diversified DQNs during training. In this work, we introduced a framework that enables RL agents to continuously learn, with inspiration from the human brain.

More from: Artificial Intelligence in Medicine
  • New
  • Research Article
  • 10.1016/j.artmed.2025.103246
EvidenceMap: Learning evidence analysis to unleash the power of small language models for biomedical question answering.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Chang Zong + 3 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103247
TIPs: Tooth instance and pulp segmentation based on hierarchical extraction and fusion of anatomical priors from cone-beam CT.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Tao Zhong + 6 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103251
Physical foundations for trustworthy medical imaging: A survey for artificial intelligence researchers.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Miriam Cobo + 3 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103243
Leveraging explainable artificial intelligence for transparent and trustworthy cancer detection systems.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Shiva Toumaj + 2 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103245
Privacy-preserving federated transfer learning for enhanced liver lesion segmentation in PET-CT imaging.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Rajesh Kumar + 4 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103237
Diagnostic performance of artificial intelligence in detecting and subtyping pediatric medulloblastoma from histopathological images: A systematic review.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Hiba Alzoubi + 13 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103242
Pandemic transition: A review of social media text mining for pandemic transition in the post-vaccination era.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Kiarash Bakhshaei + 3 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103252
Unprepared and overwhelmed: A case for clinician-focused AI education.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Nadia Siddiqui + 5 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103239
Multiplex aggregation combining sample reweight composite network for pathology image segmentation.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Dawei Fan + 8 more

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103241
BIGPN: Biologically informed graph propagational network for plasma proteomic profiling of neurodegenerative biomarkers.
  • Nov 1, 2025
  • Artificial intelligence in medicine
  • Sunghong Park + 5 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon