Riemannian Proximal Policy Optimization

Shijun Wang,Yuan Qi,James Zhang,Mingzhe Wu,Wei Chu,Baocheng Zhu,Chen Li

doi:10.5539/cis.v13n3p93

Abstract

In this paper, we propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a non-convex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.

Highlights

Reinforcement learning studies how agents explore/exploit environment, and take actions to maximize long-term reward
For two given policy functions, we provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of Gaussian mixture model (GMM)
To optimize GMM and learn the optimal policy functions efficiently, we formulate it as a non-convex optimization problem in the Riemannian space

Summary

Introduction

Reinforcement learning studies how agents explore/exploit environment, and take actions to maximize long-term reward. TRPO, PPO and CPO have shown promising performance on complex decision-making problems, such as continuous-control tasks and playing Atari, as other neural network based models, they face two typical challenges: the lack of interpretability, and difficult to converge due to the nature of non-convex optimization in high dimensional parameter space. In this study we choose GMM due to its good analytical characteristics, universal representation power and low computational cost compared with neural networks It is well-known that the covariance matrices of GMM lie in a Riemannian manifold of positive semidefinite matrices. To optimize GMM and learn the optimal policy functions efficiently, we formulate it as a non-convex optimization problem in the Riemannian space By this way, our method gains advantages in improving both interpretability and speed of convergence. It suffers from the headache of Q-learning that it can hardly handle problems with large continuous state-action space

Reinforcement Learning

Riemannian Space

Modeling Policy Function Using Gaussian Mixture Model

Riemannian Proximal Method for Policy Optimization

Lower Bound of Policy Improvement

Implementation of the Riemannian proximal policy optimization method

Simulation Environments and Baseline Methods

Preliminary Results

Conclusion

Proof of Lemma 1

Proof of Theorem 1

Proof of Lemma 3

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Riemannian Proximal Policy Optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science

Lead the way for us

Journal: Computer and Information Science	Publication Date: Jul 30, 2020
License type: CC BY 4.0

Similar Papers

Conversion of MDP problems into heuristics based planning problems using temporal decomposition
Rida Gillani ... Ali Nasir
-
Rida Gillani, et. al.Rida Gillani ... Ali Nasir
01 Jan 2015
01 Jan 2015

Age of Aggregated Information: Timely Status Update with Over-The-Air Computation
Jie Li ... Yong Zhou
-
Jie Li, et. al.Jie Li ... Yong Zhou
01 Dec 2020
01 Dec 2020

Proximal point algorithm with Schur decomposition on the cone of symmetric semidefinite positive matrices
Ronaldo Gregório ... Paulo Roberto Oliveira
Journal of Mathematical Analysis and Applications | VOL. 355
Ronaldo Gregório, et. al.Ronaldo Gregório ... Paulo Roberto Oliveira
12 Feb 2009
Journal of Mathematical Analysis and Applications | VOL. 355

A Markov decision process approach to vacant taxi routing with e-hailing
Xinlian Yu ... Hyoshin Park
Transportation Research Part B: Methodological | VOL. 121
Xinlian Yu, et. al.Xinlian Yu ... Hyoshin Park
15 Jan 2019
Transportation Research Part B: Methodological | VOL. 121

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Riemannian Proximal Policy Optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science