Policy optimization has reemerged as an important approach for reinforcement learning and optimal control problems. A notable drawback is that even for linear quadratic problems, the execution of model-free policy optimization methods generally relies on multiple trajectories to estimate the cost gradient and state covariance matrix in each iteration. This paper proposes a novel single trajectory-based policy optimization algorithm for stochastic systems subject to multiplicative noises. Specifically, three variants of policy optimization methods are proposed to learn the optimal control policy in a model-free manner, all of which are supported with provable convergence results. In contrast with the existing work of policy optimization algorithms, the execution of our algorithm reuses a single system trajectory without regenerating system trajectories in each iteration. Our algorithm is evaluated through several numerical examples, which imply our algorithm outperforms other evaluated policy optimization algorithms.
Read full abstract- All Solutions
Editage
One platform for all researcher needs
Paperpal
AI-powered academic writing assistant
R Discovery
Your #1 AI companion for literature search
Mind the Graph
AI tool for graphics, illustrations, and artwork
Unlock unlimited use of all AI tools with the Editage Plus membership.
Explore Editage Plus - Support
Overview
158 Articles
Published in last 50 years
Articles published on State Covariance Matrix
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
163 Search results
Sort by Recency