An introduction to reinforcement learning for neuroscience

  • Abstract
  • Highlights & Summary
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Reinforcement learning has a rich history in neuroscience, from early work on dopamine as a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to recent work suggesting that dopamine could implement a form of ‘distributional reinforcement learning’ popularized in deep learning (Dabney et al., 2020). Throughout this literature, there has been a tight link between theoretical advances in reinforcement learning and neuroscientific experiments and findings. As a result, the theories describing our experimental data have become increasingly complex and difficult to navigate. In this review, we cover the basic theory underlying classical work in reinforcement learning and build up to an introductory overview of methods in modern deep reinforcement learning that have found applications in systems neuroscience. We start with an overview of the reinforcement learning problem and classical temporal difference algorithms, followed by a discussion of ‘model-free’ and ‘model-based’ reinforcement learning together with methods such as DYNA and successor representations that fall in between these two extremes. Throughout these sections, we highlight the close parallels between such machine learning methods and related work in both experimental and theoretical neuroscience. We then provide an introduction to deep reinforcement learning with examples of how these methods have been used to model different learning phenomena in systems neuroscience, such as meta-reinforcement learning (Wang et al., 2018) and distributional reinforcement learning (Dabney et al., 2020). Code that implements the methods discussed in this work and generates the figures is also provided.

Similar Papers
  • Conference Article
  • Cite Count Icon 9
  • 10.1109/incet51464.2021.9456321
Evaluating the Performance of Various Deep Reinforcement Learning Algorithms for a Conversational Chatbot
  • May 21, 2021
  • R Rajamalli Keerthana + 2 more

Conversational agents are the most popular AI technology in IT trends. Domain specific chatbots are now used by almost every industry in order to upgrade their customer service. The Proposed paper shows the modelling and performance of one such conversational agent created using deep learning. The proposed model utilizes NMT (Neural Machine Translation) from the TensorFlow software libraries. A BiRNN (Bidirectional Recurrent Neural Network) is used in order to process input sentences that contain large number of tokens (20-40 words). In order to understand the context of the input sentence attention model is used along with BiRNN. The conversational models usually have one drawback, that is, they sometimes provide irrelevant answer to the input. This happens quite often in conversational chatbots as the chatbot doesn’t realize that it is answering without context. This drawback is solved in the proposed system using Deep Reinforcement Learning technique. Deep reinforcement Learning follows a reward system that enables the bot to differentiate between right and wrong answers. Deep Reinforcement Learning techniques allows the chatbot to understand the sentiment of the query and reply accordingly. The Deep Reinforcement Learning algorithms used in the proposed system is Q-Learning, Deep Q Neural Network (DQN) and Distributional Reinforcement Learning with Quantile Regression (QR-DQN). The performance of each algorithm is evaluated and compared in this paper in order to find the best DRL algorithm. The dataset used in the proposed system is Cornell Movie-dialogs corpus and CoQA (A Conversational Question Answering Challenge). CoQA is a large dataset that contains data collected from 8000+ conversations in the form of questions and answers. The main goal of the proposed work is to increase the relevancy of the chatbot responses and to increase the perplexity of the conversational chatbot.

  • Research Article
  • 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
Sample effficient deep reinforcement learning for control
  • Dec 15, 2019
  • Tim De Bruin

The arrival of intelligent, general-purpose robots that can learn to perform new tasks autonomously has been promised for a long time now. Deep reinforcement learning, which combines reinforcement learning with deep neural network function approximation, has the potential to enable robots to learn to perform a wide range of new tasks while requiring very little prior knowledge or human help. This framework might therefore help to finally make general purpose robots a reality. However, the biggest successes of deep reinforcement learning have so far been in simulated game settings. To translate these successes to the real world, significant improvements are needed in the ability of these methods to learn quickly and safely. This thesis investigates what is needed to make this possible and makes contributions towards this goal. <br/><br/>Before deep reinforcement learning methods can be successfully applied in the robotics domain, an understanding is needed of how, when, and why deep learning and reinforcement learning work well together. This thesis therefore starts with a literature review, which is presented in Chapter 2. While the field is still in some regards in its infancy, it can already be noted that there are important components that are shared by successful algorithms. These components help to reconcile the differences between classical reinforcement learning methods and the training procedures used to successfully train deep neural networks. The main challenges in combining deep learning with reinforcement learning center around the interdependencies of the policy, the training data, and the training targets. Commonly used tools for managing the detrimental effects caused by these interdependencies include target networks, trust region updates, and experience replay buffers. Besides reviewing these components, a number of the more popular and historically relevant deep reinforcement learning methods are discussed.<br/><br/>Reinforcement learning involves learning through trial and error. However, robots (and their surroundings) are fragile, which makes these trials---and especially errors---very costly. Therefore, the amount of exploration that is performed will often need to be drastically reduced over time, especially once a reasonable behavior has already been found. We demonstrate how, using common experience replay techniques, this can quickly lead to forgetting previously learned successful behaviors. This problem is investigated in Chapter 3. Experiments are conducted to investigate what distribution of the experiences over the state-action space leads to desirable learning behavior and what distributions can cause problems. It is shown how actor-critic algorithms are especially sensitive to the lack of diversity in the action space that can result form reducing the amount of exploration over time. Further relations between the properties of the control problem at hand and the required data distributions are also shown. These include a larger need for diversity in the action space when control frequencies are high and a reduced importance of data diversity for problems where generalizing the control strategy across the state-space is more difficult.<br/><br/>While Chapter 3 investigates what data distributions are most beneficial, Chapter 4 instead proposes practical algorithms to {select} useful experiences from a stream of experiences. We do not assume to have any control over the stream of experiences, which makes it possible to learn from additional sources of experience like other robots, experiences obtained while learning different tasks, and experiences obtained using predefined controllers. We make two separate judgments on the utility of individual experiences. The first judgment is on the long term utility of experiences, which is used to determine which experiences to keep in memory once the experience buffer is full. The second judgment is on the instantaneous utility of the experience to the learning agent. This judgment is used to determine which experiences should be sampled from the buffer to be learned from. To estimate the short and long term utility of the experiences we propose proxies based on the age, surprise, and the exploration intensity associated with the experiences. It is shown how prior knowledge of the control problem at hand can be used to decide which proxies to use. We additionally show how the knowledge of the control problem can be used to estimate the optimal size of the experience buffer and whether or not to use importance sampling to compensate for the bias introduced by the selection procedure. Together, these choices can lead to a more stable learning procedure and better performing controllers. <br/><br/>In Chapter 5 we look at what to learn form the collected data. The high price of data in the robotics domain makes it crucial to extract as much knowledge as possible from each and every datum. Reinforcement learning, by default, does not do so. We therefore supplement reinforcement learning with explicit state representation learning objectives. These objectives are based on the assumption that the neural network controller that is to be learned can be seen as consisting of two consecutive parts. The first part (referred to as the state encoder) maps the observed sensor data to a compact and concise representation of the state of the robot and its environment. The second part determines which actions to take based on this state representation. As the representation of the state of the world is useful for more than just completing the task at hand, it can also be trained with more general (state representation learning) objectives than just the reinforcement learning objective associated with the current task. We show how including these additional training objectives allows for learning a much more general state representation, which in turn makes it possible to learn broadly applicable control strategies more quickly. We also introduce a training method that ensures that the added learning objectives further the goal of reinforcement learning, without destabilizing the learning process through their changes to the state encoder. <br/><br/>The final contribution of this thesis, presented in Chapter 6, focuses on the optimization procedure used to train the second part of the policy; the mapping from the state representation to the actions. While we show that the state encoder can be efficiently trained with standard gradient-based optimization techniques, perfecting this second mapping is more difficult. Obtaining high quality estimates of the gradients of the policy performance with respect to the parameters of this part of the neural network is usually not feasible. This means that while a reasonable policy can be obtained relatively quickly using gradient-based optimization approaches, this speed comes at the cost of the stability of the learning process as well as the final performance of the controller. Additionally, the unstable nature of this learning process brings with it an extreme sensitivity to the values of the hyper-parameters of the training method. This places an unfortunate emphasis on hyper-parameter tuning for getting deep reinforcement learning algorithms to work well. Gradient-free optimization algorithms can be more simple and stable, but tend to be much less sample efficient. We show how the desirable aspects of both methods can be combined by first training the entire network through gradient-based optimization and subsequently fine-tuning the final part of the network in a gradient-free manner. We demonstrate how this enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization.<br/>

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.1007/s11633-023-1454-4
Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox
  • Jan 11, 2024
  • Machine Intelligence Research
  • Qiyue Yin + 8 more

With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas. Many methods have been developed for sample efficient deep reinforcement learning, such as environment modelling, experience transfer, and distributed modifications, among which distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analysing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing the usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping that this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-75490-7_2
Deep Reinforcement Learning: A New Frontier in Computer Vision Research
  • Jan 1, 2021
  • Sejuti Rahman + 3 more

Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.

  • Research Article
  • Cite Count Icon 6
  • 10.1360/n972016-00741
Break through the limits of learning by machines
  • Sep 20, 2016
  • Chinese Science Bulletin
  • Zhongzhi Shi

Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 “What are the limits of learning by machines?”. The annotation “Computers can already beat the world’s best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine”. In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM’s Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepMind’s AlphaGo sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaGo, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepMind proposed deep Q-network (DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcement learning. They continue to improve and published an improved version of DQN on Nature in 2015, which has aroused widespread concern. In order to break through the limits of learning by machines, cognitive machine learning is proposed, which is the combination of machine learning and brain cognition, so that the machine intelligence is constantly evolving, and gradually reaches the human level of artificial intelligence. A cognitive model entitled Consciousness And Memory (CAM) is proposed by author, which consists of memory, consciousness, high-level cognitive functions, perception and motor. High-level cognitive functions of the brain include learning, language, thinking, decision making, emotion, and so on. Learning is a course to accept the stimulus through the nervous system and obtain new behavior, habits and accumulation experience. According to the current research progress of brain science and cognitive science, cognitive machine learning may be interested in learning emergence, procedural memory knowledge learning, learning evolution and so on. For intelligence, so-called evolution is refers to the learning of learning and the structure also follows the change. It is important to record the learning result by structure changing and improve the learning method.

  • Supplementary Content
  • Cite Count Icon 1
  • 10.1016/j.neuron.2021.01.021
What can classic Atari video games tell us about the human brain?
  • Feb 1, 2021
  • Neuron
  • Raphael Köster + 1 more

What can classic Atari video games tell us about the human brain?

  • Research Article
  • Cite Count Icon 20
  • 10.1088/1361-6560/ac9cb3
Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey
  • Nov 11, 2022
  • Physics in Medicine & Biology
  • Lanyu Xu + 2 more

Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1155/2021/9372803
Optimization Method of Power Equipment Maintenance Plan Decision-Making Based on Deep Reinforcement Learning
  • Mar 15, 2021
  • Mathematical Problems in Engineering
  • Yanhua Yang + 1 more

The safe and reliable operation of power grid equipment is the basis for ensuring the safe operation of the power system. At present, the traditional periodical maintenance has exposed the abuses such as deficient maintenance and excess maintenance. Based on a multiagent deep reinforcement learning decision-making optimization algorithm, a method for decision-making and optimization of power grid equipment maintenance plans is proposed. In this paper, an optimization model of power grid equipment maintenance plan that takes into account the reliability and economics of power grid operation is constructed with maintenance constraints and power grid safety constraints as its constraints. The deep distributed recurrent Q-networks multiagent deep reinforcement learning is adopted to solve the optimization model. The deep distributed recurrent Q-networks multiagent deep reinforcement learning uses the high-dimensional feature extraction capabilities of deep learning and decision-making capabilities of reinforcement learning to solve the multiobjective decision-making problem of power grid maintenance planning. Through case analysis, the comparative results show that the proposed algorithm has better optimization and decision-making ability, as well as lower maintenance cost. Accordingly, the algorithm can realize the optimal decision of power grid equipment maintenance plan. The expected value of power shortage and maintenance cost obtained by the proposed method is $71.75$ $MW·H$ and $496000$ $yuan$.

  • Research Article
  • Cite Count Icon 72
  • 10.1109/tiv.2019.2919467
Deep Distributional Reinforcement Learning Based High-Level Driving Policy Determination
  • Sep 1, 2019
  • IEEE Transactions on Intelligent Vehicles
  • Kyushik Min + 2 more

Even though some of the driver assistant systems have been commercialized to provide safety and convenience to the driver, they can be applied for autonomous driving in limited situations such as highways. In this paper, we propose a supervisor agent that can enhance the driver assistant systems by using deep distributional reinforcement learning. The supervisor agent is trained using end-to-end approach that directly maps both a camera image and LIDAR data into action plan. Because the well-trained network of deep reinforcement learning can lead to unexpected actions, collision avoidance function is added to prevent dangerous situations. In addition, the highway driving case is a stochastic environment with inherent randomness and, thus, its training is performed through the distributional reinforcement learning algorithm, which is specialized for stochastic environment. The optimal action for autonomous driving is selected through the return value distribution. Finally, the proposed algorithm is verified through a highway driving simulator, which is implemented by the Unity ML-agents.

  • Single Book
  • Cite Count Icon 38
  • 10.7551/mitpress/14207.001.0001
Distributional Reinforcement Learning
  • May 30, 2023
  • Marc G Bellemare + 2 more

The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for thinking about decisions from a probabilistic perspective. Distributional reinforcement learning is a new mathematical formalism for thinking about decisions. Going beyond the common approach to reinforcement learning and expected values, it focuses on the total reward or return obtained as a consequence of an agent's choices—specifically, how this return behaves from a probabilistic perspective. In this first comprehensive guide to distributional reinforcement learning, Marc G. Bellemare, Will Dabney, and Mark Rowland, who spearheaded development of the field, present its key concepts and review some of its many applications. They demonstrate its power to account for many complex, interesting phenomena that arise from interactions with one's environment. The authors present core ideas from classical reinforcement learning to contextualize distributional topics and include mathematical proofs pertaining to major results discussed in the text. They guide the reader through a series of algorithmic and mathematical developments that, in turn, characterize, compute, estimate, and make decisions on the basis of the random return. Practitioners in disciplines as diverse as finance (risk management), computational neuroscience, computational psychiatry, psychology, macroeconomics, and robotics are already using distributional reinforcement learning, paving the way for its expanding applications in mathematical finance, engineering, and the life sciences. More than a mathematical approach, distributional reinforcement learning represents a new perspective on how intelligent agents make predictions and decisions.

  • Conference Article
  • 10.46720/f2021-acm-108
Autonomous Driving Decision-making Based on the Combination of Deep Reinforcement Learning and Rule-based Controller
  • Sep 30, 2021
  • Jinzhu Wang Jinzhu Wang + 3 more

As autonomous vehicles begin to drive on the road, rational decision making is essential for driving safety and efficiency. The decision-making of autonomous vehicles is a difficult problem since it depends on the surrounding dynamic environment constraints and its own motion constraints. As the result of the combination of deep learning (DL) and reinforcement learning (RL), deep reinforcement learning (DRL) integrates DL's strong understanding of perception problems such as visual and semantic text, as well as the decision-making ability of RL. Hence, DRL can be used to solve complex problems in real scenarios. However, as an end-to-end method, DRL is inefficient and the final result tend to be poorly robust. Considering the usefulness of existing domain knowledge for autonomous vehicle decision-making, this paper uses domain knowledge to establish behavioral rules and combine rule-based behavior strategies with DRL methods, so that we can achieve efficient training of autonomous vehicle decision-making models and ensure the vehicle to chooses safe actions under unknown circumstances. First, the continuous decision-making problem of autonomous vehicles is modeled as a Markov decision process (MDP). Taking into account the influence of unknown intentions of other road vehicles on self-driving decisions, a recognition model of the behavioral intentions of other vehicles was established. Then, the linear dynamic model of the conventional vehicle is used to establish the relationship between the vehicle decision-making behavior and the motion trajectory. Finally, by designing the reward function of the MDP, we use a combination of RL and behavior rules-based controller, the expected driving behavior of the autonomous vehicle is obtained. In this paper, the simulation environment of scenes of intersections in urban roads and highways is established, and each situation is formalized as an RL problem. Meanwhile, a large number of numerical simulations were carried out, and the comparison of our method and the end-to-end form of DRL technology were discussed. "Due to its robust operation and high performance during bad weather conditions and overnight as well as the ability of using the Doppler Effect to measure directly the velocity of objects, the radar sensor is used in many application fields. Especially in automotive many radar sensors are used for the perception of the environment to increase the safety of the traffic. To increase the security level especially for vulnerable road users (VRU’s) like pedestrians or cyclists, radar sensors are used in driver assistance systems. Radar sensors are also used in the infrastructure, e.g. a commercial application is the detection of cars and pedestrians to manage traffic lights. Furthermore, radar sensors installed in the infrastructure are used in research projects for safeguarding future autonomous traffic. The object recognition and accuracy of radar-based sensing in the infrastructure can be increased by cooperating radar systems, which consist out of several sensors. This paper focus on the data fusion method of two radar sensors to increase the performance of detection and localization. For data fusion the high level cluster data of the two radar sensors are used as input data in a neuronal net (NN) structure. The results are compared to the localization obtained by using only a single radar sensor operating with an ordinary tracking algorithm. First, different models for chosen region of interests (ROI) and operating mode of cooperative sensors are developed and the data structure is discussed. In addition, the data are preprocessed with a coordinate transformation and time synchronization for both sensors, as well as the noise filtering to reduce the amount of clusters for the algorithm. Furthermore, three NN structures (CNN, DNN and LSTM) for static + dynamic objects and only dynamic objects are created, trained and discussed. Also, based on the results further improvements for the NN performance will be discussed."

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/icrae50850.2020.9310796
Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning
  • Nov 20, 2020
  • Wenshuai Zhao + 3 more

Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/case49439.2021.9551544
Exploration via Distributional Reinforcement Learning with Epistemic and Aleatoric Uncertainty Estimation
  • Aug 23, 2021
  • Qi Liu + 5 more

The problem of exploration remains one of the major challenges in deep reinforcement learning (RL). This paper proposes an approach to improve the exploration efficiency for distributional RL. First, this paper proposes a novel method to estimate the epistemic and aleatoric uncertainty for distributional RL using deep ensembles, which is inspired by Bayesian Deep Learning. Second, This paper presents a method to improve the exploration efficiency for deep distributional RL by using estimated epistemic uncertainty. Experimental results show that the proposed approach outperforms the baseline in Atari games.

  • Research Article
  • Cite Count Icon 140
  • 10.1007/s10462-021-10061-9
Deep reinforcement learning in computer vision: a comprehensive survey
  • Sep 29, 2021
  • Artificial Intelligence Review
  • Ngan Le + 4 more

Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.