Abstract

In this paper we provide a learning based algorithm for solving a state constrained stochastic control problem and apply it to controlling the motion of a 2D cartpole system (2DCPS) i.e. a planar inverted pendulum robot. The goal of the proposed algorithm is to learn to generate an optimal Markov control policies, under the presence of environmental uncertainties, for the task of path following and pole balancing and avoiding obstacles along the prescribed path. To model the environment uncertainties, we apply the standard framework of stochastic differential equations (SDEs). The resulting state constrained stochastic control problem (SCP) is solved statisti-cally via a maximum likelihood estimator (MLE). Relying on universal approximation power of neural-networks (NNs), we build our MLE using gated-recurrent-units (GRUs). Apart from being able to include state constraints, the novel feature of the estimator is the incorporation of ergodic policy generation and step length generation networks. Consequently, the estimator is practically able to handle (i) general SCP with generic (non-quadratic) cost functions, and (ii) deal with heterogeneous and asynchronous sequential data. We apply our algorithm on a model dynamical system i.e. the 2DCPS and evaluate several training configurations. Based on the results we are led to conclude that the direct policy sampling method to control the forward process performs better than using the forward-backward SDEs (FBSDEs) method. Finally, by comparison, we show that the optimally configured estimator outperforms classical optimization method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call