Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation

Omid Aghazadeh,Maziar Ahmad Sharbafi,Abolfazl Toroghi Haghighat

doi:10.1007/978-3-540-68847-1_42

Abstract

Decision making in complex, multi agent and dynamic environments such as Rescue Simulation is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which is a common difficulty of real time environment makes decision making more complicated in such environments. Our approach to solve the bottleneck of dynamicity and variety of conditions in such situations is reinforcement learning. Classic reinforcement learning methods usually work with state and action value functions and temporal difference updates. Using function approximation is an alternative method to hold state and action value functions directly. Many Reinforcement learning methods in continuous action and state spaces implement function approximation and TD updates such as TD, LSTD, iLSTD, etc. A new approach to online reinforcement learning in continuous action or state spaces is presented in this paper which doesn't work with TD updates. We have named it Parametric Reinforcement Learning. This method is utilized in Robocup Rescue Simulation / Police Force agent's decision making process and the perfect results of this utilization have been shown in this paper. Our simulation results show that this method increases the speed of learning and simplicity of use. It has also very low memory usage and very low costing computation time.

Full Text