Abstract

SUMMARYThe curse of dimensionality is a ubiquitous problem for multiagent reinforcement learning, which means the learning and storing space grows exponentially with the number of agents and hinders the application of multiagent reinforcement learning. To relieve this problem, we propose a new framework named as optimal tracking agent (OTA). The OTA views the other agents as part of the environment and uses a reduced form to learn the optimal decision. Although merging other agents into the environment may reduce the dimension of action space, the environment characterized by such form is dynamic and does not satisfy the convergence of reinforcement learning (RL). Thus, we develop an estimator to track the dynamics of the environment. The estimator obtains the dynamic model, and then the model‐based RL can be used to react to the dynamic environment optimally. Because the Q‐function in OTA is also a dynamic process because of other agents’ dynamics, different from traditional RL, in which the learning is a stationary process and the usual action selection mechanisms just suit to such stationary process, we improve the greedy action selection mechanism to adapt to such dynamics. Thus, the OTA will have convergence. An experiment illustrates the validity and efficiency of the OTA.Copyright © 2012 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call