Potential-based Reward Shaping Research Articles

AbstractMulti-agent systems (MASs) are a form of distributed intelligence, where multiple autonomous agents act in a common environment. Numerous complex, real world systems have been successfully optimized using multi-agent reinforcement learning (MARL) in conjunction with the MAS framework. In MARL agents learn by maximizing a scalar reward signal from the environment, and thus the design of the reward function directly affects the policies learned. In this work, we address the issue of appropriate multi-agent credit assignment in stochastic resource management games. We propose two new stochastic games to serve as testbeds for MARL research into resource management problems: the tragic commons domain and the shepherd problem domain. Our empirical work evaluates the performance of two commonly used reward shaping techniques: potential-based reward shaping and difference rewards. Experimental results demonstrate that systems using appropriate reward shaping techniques for multi-agent credit assignment can achieve near-optimal performance in stochastic resource management games, outperforming systems learning using unshaped local or global evaluations. We also present the first empirical investigations into the effect of expressing the same heuristic knowledge in state- or action-based formats, therefore developing insights into the design of multi-agent potential functions that will inform future work.

Read full abstract

This paper investigates some conditions under which polarized user appraisals gathered throughout the course of a vocal interaction between a machine and a human can be integrated in a reinforcement learning-based dialogue manager. More specifically, we discuss how this information can be cast into socially-inspired rewards for speeding up the policy optimisation for both efficient task completion and user adaptation in an online learning setting. For this purpose a potential-based reward shaping method is combined with a sample efficient reinforcement learning algorithm to offer a principled framework to cope with these potentially noisy interim rewards. The proposed scheme will greatly facilitate the system's development by allowing the designer to teach his system through explicit positive/negative feedbacks given as hints about task progress, in the early stage of training. At a later stage, the approach will be used as a way to ease the adaptation of the dialogue policy to specific user profiles. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS), support our claims in two configurations: firstly, with a user simulator in the tourist information domain (and thus simulated appraisals), and secondly, in the context of man–robot dialogue with real user trials.

Read full abstract

Potential-based Reward Shaping Research Articles

Related Topics

Articles published on Potential-based Reward Shaping

Multi-agent credit assignment in stochastic resource management games

Plan-based reward shaping for multi-agent reinforcement learning

Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards

Potential-based reward shaping for finite horizon online POMDP planning

Expressing Arbitrary Reward Functions as Potential-Based Advice

AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

Online learning of shaping rewards in reinforcement learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Potential-based Reward Shaping Research Articles

Related Topics

Articles published on Potential-based Reward Shaping

Multi-agent credit assignment in stochastic resource management games

Plan-based reward shaping for multi-agent reinforcement learning

Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards

Potential-based reward shaping for finite horizon online POMDP planning

Expressing Arbitrary Reward Functions as Potential-Based Advice

AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

Online learning of shaping rewards in reinforcement learning