A projected primal-dual gradient optimal control method for deep reinforcement learning

Simon Gottschalk,Michael Burger,Matthias Gerdts

doi:10.1186/s13362-020-00075-3

Abstract

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.

Highlights

Techniques in order to solve optimal control problems are interesting for many operators in different research areas because of the prevalence of these problems
Beside the Ordinary Differential Equations (ODE)-interpretation, we focus on the transfer of necessary optimality conditions for optimal control problems into the Reinforcement Learning setting
6 Conclusion After introducing the basic setting of a policy-based Reinforcement Learning algorithm based on neural networks and recapitulating the connection between neural networks and ODEs, we successfully derived an infinite optimization problem consisting of an objective function and constraints given by ordinary differential equations

Summary

Introduction

Techniques in order to solve optimal control problems are interesting for many operators in different research areas because of the prevalence of these problems. K(B, ·, ·) is measurable on (S × A, S × A) for any B ∈ S At this point, we stress out that, despite the fact that we introduced the MDP for general measurable spaces, we restrict ourselves to the state space Rns and control space Rna or at least subsets of these in this contribution, which are suitable for the most application cases. We stress out that, despite the fact that we introduced the MDP for general measurable spaces, we restrict ourselves to the state space Rns and control space Rna or at least subsets of these in this contribution, which are suitable for the most application cases For these spaces, we can use the Borel σ -algebra.

Goal of reinforcement learning

Parameterized policy

Conclusion