Abstract

Rewards are the critical signals for Reinforcement Learning (RL) algorithms to learn the desired behavior in a sequential multi-step learning task. However, when these rewards are delayed and noisy in nature, the learning process becomes more challenging. The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While RL, especially Deep RL, often works well with immediate rewards but may fail when rewards are delayed or noisy, or both. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns to infer the immediate rewards from the delayed and noisy rewards. The effectiveness of InferNet was evaluated on three online RL tasks: a GridWorld, a CartPole, and 40 Atari games; and two offline RL tasks: GridWorld and a real-life Sepsis treatment task. The effectiveness of InferNet rewards is compared to that of immediate and delayed rewards in two settings: with and without noise. For the offline RL tasks, it is also compared to a strong baseline, InferGP [7]. Overall, our results show that InferNet is robust to delayed or noisy reward functions, and it could be used effectively for solving the temporal CAP in a wide range of RL tasks, when immediate rewards are not available or they are noisy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call