Abstract
AbstractThe majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.