Abstract

SUMMARYIn this paper, we propose a method to acquire a series of cooperative actions to reach an appropriate goal without the designer controlling the reward. To accomplish this, we introduce a new concept of “reward interpretation.” This is the idea that an agent can increase or decrease the reward given by the environment through the reward interpretation on its won. We applied this idea to the Q‐learning method. The simulation results show that the proposed method is superior to a standard Q‐learning method and a Q‐learning method with cooperation in terms of the number of successful instances of cooperation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.