Abstract

Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be given to agents, an important problem that arises is how can agents deal with erroneous knowledge and what is the impact to their behavior both in a single- as well as a multi-agent setting where agents are faced with conflicting goals. Previous research demonstrated the use of plan-based reward shaping with knowledge revision in a single agent scenario where agents showed that they can quickly identify and revise erroneous knowledge and thus benefit from more accurate plans. Moving to a multi-agent setting the use of individual plans as a source of reward shaping has not been as successful due to the agents' conflicting goals. In this paper we present the use of MDPs as a method to provide heuristic knowledge coupled with a revision algorithm to manage the cases where the provided domain knowledge is wrong. We show how agents can deal with erroneous knowledge in the single agent case and how this method can be used in a multi-agent environment for conflict resolution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call