Policy Sharing Using Aggregation Trees for ${Q}$ -Learning in a Continuous State and Action Spaces

Yu-Jen Chen,Wei-Cheng Jiang,Kao-Shing Hwang,Ming-Yi Ju

doi:10.1109/tcds.2019.2926477

Abstract

$Q$ -learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete state space. The application of $Q$ -learning in a continuous state/action domain is the subject of many studies. This paper uses a tree structure to approximate a $Q$ -function using in a continuous state domain. The agent selects a discretized action with a maximum $Q$ -value and this discretized action is then extended to a continuous action using an action bias function. Reinforcement learning is difficult for a single agent when the state space is huge. This proposed architecture is also applied to a multiagent system, wherein an individual agent transfers its useful $Q$ -values to other agents to accelerate the learning process. Policy is shared between agents by grafting the branches of trees in which $Q$ -values are stored to other trees. The results for simulation show that the proposed architecture performs better than tabular $Q$ -learning and significantly accelerates the learning process because all agents use the sharing mechanisms to cooperate with each other.

Full Text