A Semi-Markov Decision Model With Inverse Reinforcement Learning for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

Kai Xu,Quanjun Yin,Long Qin,Yunxiu Zeng

doi:10.1109/access.2020.2967642

Kai Xu, Quanjun Yin + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2967642

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

Affiliation: National University of Defense Technology

Abstract

Recognizing the destination of a maneuvering agent is important to create intelligent AI players in Real Time Strategy (RTS) games. Among different ways of problem formulation, goal recognition can be solved as a model-based planning problem using off-the-shelf planners. However, the common problem in these frameworks is that they usually lack of the modeling of the action duration as in real-world scenarios the agent may take several steps to transit between grids. To solve this problem, a semi-Markov decision model (SMDM), which explicitly models the duration of an action, is proposed in this paper. Besides, most of the current works do not establish a behavioral model of the identified person, and there is almost no work modeling individual behavioral preference, which limits the accuracy of the recognition results. In this paper, the Inverse Reinforcement Learning (IRL) method is adopted in opponent behavior learning for the destination recognition problem. To adapt to the dynamic environment, the Maximum Entropy Inverse Reinforcement Learning (MaxEnt IRL) method is transformed by defining a Fitness index to measure the effect of weight and use the Nelder-Mead polyhedron search to find the optimal weight. In experiments, we build the game scenario in the Unreal Engine 4 environment and collect the moving trajectories from the human players in several different tasks for evaluating the performance of our methods. The results show that the recognizer using IRL can recognize the destination effectively even if the intention changes during the midway, and it performs better than other models in terms of several most frequently-used metrics.

Highlights

In the recent decades, many commercial real time strategy (RTS) games, such as StarCraft and WarCraft, have become increasingly popular
To validate the performance of our adapted MaxEnt Inverse Reinforcement Learning (IRL) in goal recognition, experiments are made on three aspects: a) to discuss the details of our adapted MaxEnt IRL, we present and analyze the value map of two specific trace fragments and the Fitness of each training dataset; b) to discuss the details of the goal recognizer using IRL methods and Reinforcement Learning (RL) methods, we present and analyze the recognition results of two specific traces; c) to compare the overall performance of our adapted MaxEnt IRL, Apprenticeship Inverse Reinforcement Learning (AIRL) and RL methods in goal recognition, we make comparisons in terms of precision, recall and F-measure, which will be introduced in the last part
In this paper, a semi-Markov decision model (SMDM) is proposed to solve goal recognition problem, which can explicitly models the duration of an action

Summary

INTRODUCTION

Many commercial real time strategy (RTS) games, such as StarCraft and WarCraft, have become increasingly popular. We propose an Inverse Reinforcement Learning (IRL) based opponent behavior learning method, which would be adopted in the agent’s goal recognition within a Semi-Markov Decision Model (SMDM) framework to model problem. Different from the goal inference procedure within the formal SMDM model, we ignore the action duration during the IRL learning process, considering that the probabilities of action selection defined by the agent policy are calculated only based on states. In this way, MDP formulation could replace the complex SMDM during the IRL process.

GOAL INFERENCE USING THE RBPF

HUMAN BEHAVIOR LEARNING

POLICY ESTIMATION USING THE MODIFIED MAXENT IRL

End For

RESULTS AND DISCUSSION

CONCLUSION