Abstract

Reinforcement learning under partial observability poses two types of conceptually orthogonal learning challenges; learning a useful representation of the observable past, and selecting actions based on that representation. While the latter is common to both fully and partially observable control problems, the former is idiomatic to the partially observable case. Traditional reinforcement learning relies exclusively on the reward signal to train the representation model, but rewards constitute a weak learning signal in partially observable settings. To help train richer representations, we discuss and propose an auxiliary self-supervised learning task designed to help the primary reinforcement learning task. We further propose two training schemes for training with auxiliary tasks, one inspired by the idea of promoting complementary representations. Empirical evaluations show that agents which exploit auxiliary tasks and complementary representations learn better policies, and converge faster, compared to agents which use common reactive and recurrent representations trained on the rewards alone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call