Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Pawel Ladosz,Soheil Kolouri,Andrea Soltoggio,Eseoghene Ben-Iwhiwhu,Praveen K Pilly,Nicholas Ketz,Jeffery Dick,Jeffrey L Krichmar

doi:10.1109/tnnls.2021.3110281

Abstract

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.

Highlights

This section reports the analysis of how (i) learning mechanisms in modulated Hebbian network (MOHN) compare to those of REINFORCE and deep Q-network [30] (DQN); (ii) the MOHN and new loss function enhances the features from DQN to solve the partially observable Markov decision process (POMDP) problems; (iii) the modulated Hebbian plus Q network architecture (MOHQA) compares against DQN, QRDQN+LSTM, REINFORCE, A2C and aggregated memory for reinforcement learning (AMRL) in the CT-graph and Malmo benchmarks
The MOHN’s learning mechanisms i) Hebbian learning, ii) eligibility traces and iii) rare correlations are contrasted against two other classical learning methods i) temporal difference (TD) learning in the form of DQN and ii) policy gradient in the form of REINFORCE
This paper considers solving confounding POMDPs using a new neural architecture (MOHQA) for deep reinforcement learning

Summary

Introduction

Findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA). Nicholas Ketz and Praveen Pilly are with with Information and Systems Sciences Laboratory, HRL Laboratories, 3011 Malibu Canyon Road, Malibu, CA 90265, USA. Soheil Kolouri is with the Computer Science Department at Vanderbilt University, Nashville, TN, 37235. This research was performed when he was with the Information and Systems Sciences Laboratory, HRL Laboratories, Malibu, CA, 90265.

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: May 1, 2022
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Similar Papers

Model-Free Recurrent Reinforcement Learning for AUV Horizontal Control
Yujia Huo ... Xisheng Feng
IOP Conference Series: Materials Science and Engineering | VOL. 428
Yujia Huo, et. al.Yujia Huo ... Xisheng Feng
01 Sep 2018
IOP Conference Series: Materials Science and Engineering | VOL. 428

E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments
Vedang Naik ... Rohit Sahoo
International Journal of Engineering and Advanced Technology | VOL. 11
Vedang Naik, et. al.Vedang Naik ... Rohit Sahoo
30 Dec 2021
International Journal of Engineering and Advanced Technology | VOL. 11

Complex-Valued Reinforcement Learning: a Context-Based Approach for POMDPs
Takeshi Shibuya ... Tomoki Hamagami
-
Takeshi Shibuya, et. al.Takeshi Shibuya ... Tomoki Hamagami
14 Jan 2011
14 Jan 2011

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems Part 2—Applications in Transportation, Industries, Communications and Networking and More Topics
Xuanchen Xiang ... Huanyu Zang
Machine Learning and Knowledge Extraction | VOL. 3
Xuanchen Xiang, et. al.Xuanchen Xiang ... Huanyu Zang
28 Oct 2021
Machine Learning and Knowledge Extraction | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems