Abstract

The authors consider the task of learning control problem in reinforcement learning (RL) with continuous action space. Policy gradient, and in particular the determinist policy gradient (DPG) algorithm, provides a method for solving learning control problem with continuous action space. However, when the RL task is complex enough so that tuning of the function approximation is necessary, hand-tuning for the features is infeasible. In order to solve this problem, the authors extend DPG algorithm by adding an approximate-linear-dependency-based sparsification procedure, which makes DPG algorithm to automatically select the useful and sparse features. As far as the authors know, this is the first time to consider the feature selection problem in DPG. Simulation results illustrate that (i) the proposed algorithm can find the optimal solution of the continuous version of mountain car problem; (ii) the proposed algorithm achieves good performance over a large range of the approximate linear dependency threshold parameter settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call