Evolving Recurrent Neural Network Controllers by Incremental Fitness Shaping

Andrew Philippides,Kaan Akinci

doi:10.1162/isal_a_00196

Abstract

Time varying artificial neural networks are commonly used for dynamic problems such as games controllers and robotics as they give the controller a memory of what occurred in previous states which is important as actions in previous states can influence the final success of the agent. Because of this temporal dependence, methods such as back-propagation can be difficult to use to optimise network parameters and so genetic algorithms (GAs) are often used instead. While recurrent neural networks (RNNs) are a common network used with GAs, long short term memory (LSTM) networks have had less attention. Since, LSTM networks have a wide range of temporal dynamics, in this paper, we evolve an LSTM network as a controller for a lunar lander task with two evolutionary algorithms: a steady state GA (SSGA) and an evolutionary strategy (ES). Due to the presence of a large local optima in the fitness space, we implemented an incremental fitness scheme to both evolutionary algorithms. We also compare the behaviour and evolutionary progress of the LSTM with the behaviour of an RNN evolved via NEAT and ES with the same fitness function. LSTMs proved themselves to be evolvable on such tasks, though the SSGA solution was outperformed by the RNN. However, despite using an incremental scheme, the ES developed solutions far better than both showing that ES can be used both for incremental fitness and for LSTMs and RNNs on dynamic tasks.

Highlights

While deep feed-forward neural networks have been used very successfully in static problems where there is no temporal dependence between inputs, non-Markovian problems such as controllers for robots or games could potentially benefit by temporally extended networks
We ran the Steady State Genetic Algorithm with Long short-term memory (LSTM) network with the default fitness function, for 5,200 trials (52,000 evaluations), where each trial consists of 10 resampled episodes the fitness does not improve and behaviour of the resulting network was the same in all runs
The steady state algorithm with LSTM was found to be less useful when the landscape was complex and noisy and found the local optimum faster than the NEAT algorithm with recurrent neural networks (RNNs) network, as the faster converge rate of the population decreased the chance of finding other possible optima

Summary

Introduction

While deep feed-forward neural networks have been used very successfully in static problems where there is no temporal dependence between inputs, non-Markovian problems such as controllers for robots or games could potentially benefit by temporally extended networks (networks with a temporal element). Long short-term memory (LSTM) networks – which have complex forms of memory are interesting networks because of their potential to capture long term temporal dependencies and have been used successfully in a number of tasks. Evolutionary optimisation has been used as an alternative to reinforcement learning to develop solutions since it require less computational power per episode and memory Long Short-Term-Memory (LSTM) networks are advanced versions of RNN networks that can selectively forget and update hidden states. LSTM networks have a hidden state and a memory. These properties allow LSTM networks to be aware of past actions and experiences enabling long-term temporal dependencies in the decision making process

Objectives

Results

Conclusion