InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

Markel Sanz Ausin,Min Chi,Yeo Jin Kim,Song Ju,Hamoon Azizsoltani

doi:10.1109/bigdata52589.2021.9671827

Abstract

Rewards are the critical signals for Reinforcement Learning (RL) algorithms to learn the desired behavior in a sequential multi-step learning task. However, when these rewards are delayed and noisy in nature, the learning process becomes more challenging. The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While RL, especially Deep RL, often works well with immediate rewards but may fail when rewards are delayed or noisy, or both. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns to infer the immediate rewards from the delayed and noisy rewards. The effectiveness of InferNet was evaluated on three online RL tasks: a GridWorld, a CartPole, and 40 Atari games; and two offline RL tasks: GridWorld and a real-life Sepsis treatment task. The effectiveness of InferNet rewards is compared to that of immediate and delayed rewards in two settings: with and without noise. For the offline RL tasks, it is also compared to a strong baseline, InferGP [7]. Overall, our results show that InferNet is robust to delayed or noisy reward functions, and it could be used effectively for solving the temporal CAP in a wide range of RL tasks, when immediate rewards are not available or they are noisy.

Full Text