Abstract

Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.

Highlights

  • The field of deep reinforcement learning (RL) has showcased amazing results in recent time, solving tasks in robotic control [4, 12], games [15] and other complex environments

  • Ben-Iwhiwhu et al In this paper, we investigate the use of neuroevolution to autonomously evolve inborn knowledge [26] in the form of neural structures and plasticity rules with a specific focus on dynamic POMDPs that have posed challenges to current RL approaches

  • We evaluate our proposed method in a POMDP environment where we show better performance in comparison to some non-evolutionary deep meta-reinforcement learning methods

Read more

Summary

Introduction

The field of deep reinforcement learning (RL) has showcased amazing results in recent time, solving tasks in robotic control [4, 12], games [15] and other complex environments. Despite such successes, deep RL algorithms are sample inefficient and sometimes unstable. In an attempt to solve this problem, deep meta-reinforcement learning (meta-RL) methods [5, 6, 20, 31, 34] were devised These methods are largely evaluated on dense reward, fully observable MDP environments, and perform sub-optimally in sparse reward, partially observable environments

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call