Reward-predictive representations generalize across tasks in reinforcement learning.

Samuel J. Gershman,Lucas Lehnert,Michael J. Frank,Michael L. Littman

doi:10.1371/journal.pcbi.1008317

Samuel J. Gershman, Lucas Lehnert + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1008317

Copy DOI

Abstract

In computer science, reinforcement learning is a powerful framework with which artificial agents can learn to maximize their performance for any given Markov decision process (MDP). Advances over the last decade, in combination with deep neural networks, have enjoyed performance advantages over humans in many difficult task settings. However, such frameworks perform far less favorably when evaluated in their ability to generalize or transfer representations across different tasks. Existing algorithms that facilitate transfer typically are limited to cases in which the transition function or the optimal policy is portable to new contexts, but achieving "deep transfer" characteristic of human behavior has been elusive. Such transfer typically requires discovery of abstractions that permit analogical reuse of previously learned representations to superficially distinct tasks. Here, we demonstrate that abstractions that minimize error in predictions of reward outcomes generalize across tasks with different transition and reward functions. Such reward-predictive representations compress the state space of a task into a lower dimensional representation by combining states that are equivalent in terms of both the transition and reward functions. Because only state equivalences are considered, the resulting state representation is not tied to the transition and reward functions themselves and thus generalizes across tasks with different reward and transition functions. These results contrast with those using abstractions that myopically maximize reward in any given MDP and motivate further experiments in humans and animals to investigate if neural and cognitive systems involved in state representation perform abstractions that facilitate such equivalence relations.

Highlights

A central question in reinforcement learning (RL) [1] is which representations facilitate re-use of knowledge across different tasks
Because we found that Linear Successor Feature Models (LSFMs) are easier to use than Linear Action Models (LAM) in practice, this article focuses on LSFMs
The reward-predictive state abstraction (Fig 2B) can be re-used to plan a different policy in Task B, while the reward-maximizing state abstraction (Fig 2C) cannot be re-used in Task B. Such a benefit is only possible if the two tasks share an abstract relation: This columnar state abstraction would not be useful in subsequent Markov decision process (MDP) that arranged in rows

Summary

Introduction

A central question in reinforcement learning (RL) [1] is which representations facilitate re-use of knowledge across different tasks. Existing deep reinforcement learning algorithms, such as the DQN algorithm [2], construct latent representations to find a reward-maximizing policy in tasks with complex visual inputs While these representations may be useful for abstracting across states in the service of optimal performance in a specific task, this article considers representations that facilitate re-use across different tasks. A person who has learned in one scenario can quickly generalize to the other, despite the fact that both tasks require different coordination of motor skills Both tasks are the same in an abstract sense: In each case, there is a progression from 1st to 2nd gear and so on, which should be coordinated with the clutch pedal and steering, and this structure can be generalized from a left-hand-drive car to a right-hand-drive car [3, 4] and a driver does not have to learn how to drive from scratch

Methods

Results

Discussion

Conclusion