Abstract

Humans learn from both successful and unsuccessful experiences, because useful information about how to solve complex problems can be gleaned not only from success but also from failure. In this paper, we propose a method for investigating this difference by applying Preference based Inverse Reinforcement Learning to Double Transition Models built from replays of StarCraft II. Our method provides two advantages: (1) the ability to identify integrated reward distributions from computational models composed of multiple experiences, and (2) the ability to discern differences between learning by successes and failures. Our experimental results demonstrate that reward distributions are shaped depending on the trajectories utilized to build models. Reward distributions based on successful episodes were skewed to the left, while those based on unsuccessful episodes were skewed to the right. Furthermore, we found that players with symmetric triple reward distributions had a high probability of winning the game.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.