Abstract

Reinforcement learning (RL) of optimal policies against an opponent agent also with learning capability is still challenging in Markov games. A variety of algorithms have been proposed for solving this problem such as the traditional Q-learning-based RL (QbRL) algorithms as well as the state-of-the-art neural-network-based RL (NNbRL) algorithms. However, the QbRL approaches have poor generalization capability for complex problems with non-stationary opponents, while the learned policies by NNbRL algorithms are lack of explainability and transparency. In this paper, we propose an algorithm X-OMQ(λ) that integrates eXtended Classifier System (XCS) with opponent modelling for concurrent reinforcement learners in zero-sum Markov Games. The algorithm can learn general, accurate, and interpretable action selection rules and allow policy optimization using the genetic algorithm (GA). Besides, the X-OMQ(λ) agent optimizes the established opponent’s model while simultaneously learning to select actions in a goal-directed manner. In addition, we use the eligibility trace mechanism to further speed up the learning process. In the reinforcement component, not only the classifiers in the action set are updated, but other relevant classifiers are also updated in a certain proportion. We demonstrate the performance of the proposed algorithm in the hunter prey problem and two adversarial soccer scenarios where the opponent is allowed to learn with several benchmark QbRL and NNbRL algorithms. The results show that our method has similar learning performance with the NNbRL algorithms while our method requires no prior knowledge of the opponent or the environment. Moreover, the learned action selection rules are also interpretable while having generalization capability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call