Novelty search for deep reinforcement learning policy network weights by action sequence edit metric distance

Ethan C Jackson,Mark Daley

doi:10.1145/3319619.3321956

Abstract

Reinforcement learning (RL) problems often feature deceptive local optima, and methods that optimize purely for reward often fail to learn strategies for overcoming them [2]. Deep neuroevolution and novelty search have been proposed as effective alternatives to gradient-based methods for learning RL policies directly from pixels. We introduce and evaluate the use of novelty search over agent action sequences by Levenshtein distance as a means for promoting innovation. We also introduce a method for stagnation detection and population regeneration inspired by recent developments in the RL community [5], [1] that is derived from novelty search. Our methods extend a state-of-the-art method for deep neuroevolution using a simple genetic algorithm (GA) designed to efficiently learn deep RL policy network weights [6]. Results provide further evidence that GAs are competitive with gradient-based algorithms for deep RL in the Atari 2600 benchmark. Results also demonstrate that novelty search over agent action sequences can be effectively used as a secondary source of evolutionary selection pressure.

Full Text