A Functional Clipping Approach for Policy Optimization Algorithms

Wangshu Zhu,Andre Rosendo

doi:10.1109/access.2021.3094566

Wangshu Zhu, Andre Rosendo

Open Access

PDF Available

https://doi.org/10.1109/access.2021.3094566

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: ShanghaiTech University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the main hyperparameter in our algorithm.

Highlights

Reinforcement learning, especially deep modelfree reinforcement learning, has achieved great progress in recent years
Inspired by the insights above, we propose a novel PPO clipping method, named Proximal Policy Optimization Smoothed algorithm (PPOS) which combines the strengths of both PPO and Policy Optimization with Rollback (PPORB)
TrustRegion Optimization (TRO) methods are used for keeping the new policy not far away from the old policy, which was first introduced in the relative entropy policy search (REPS) algorithm [17]

Summary

INTRODUCTION

Reinforcement learning, especially deep modelfree reinforcement learning, has achieved great progress in recent years. The subjecting item is computationally inefficient and difficult to scale up for high-dimensional problems when extending to complex network architectures To address this problem, the Proximal Policy Optimization (PPO), which adopts a clipping mechanism on the probability of likelihood ratio, was introduced [20]. PPORB adopts a straight downward-slope function instead of the original flat function when the ratio is out of the clipping range, which hinders the natural incentive from PPO to seek a large policy update. This solution introduces new problems, such as when the ratio is extremely big and the clipped ratio will shoot to infinity or negative infinity, and contrast with its original aim. We analyze the hyperparameter here introduced in relation to the dimensions of five problems and provide a useful guideline for readers to orient their hyperparametric choices according to the dimension of their problems

PRELIMINARIES

PROXIMAL POLICY OPTIMIZATION

EXPERIMENTS

CHOICE OF THE HYPERPARAMETER

Findings

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A Functional Clipping Approach for Policy Optimization Algorithms

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Authentic Boundary Proximal Policy Optimization.
Yuhu Cheng ... Xuesong Wang
IEEE transactions on cybernetics | VOL. 52
Yuhu Cheng, et. al.Yuhu Cheng ... Xuesong Wang
11 Mar 2021
IEEE transactions on cybernetics | VOL. 52

Policy Optimization Algorithm with Activation Likelihood-Ratio for Multi-agent Reinforcement Learning
Lu Jia ... Jun Wang
Neural Processing Letters | VOL. 56
Lu Jia, et. al.Lu Jia ... Jun Wang
25 Nov 2024
Neural Processing Letters | VOL. 56

Off-Policy Proximal Policy Optimization
Wenjia Meng ... Gang Pan
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Wenjia Meng, et. al.Wenjia Meng ... Gang Pan
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

A Survey on Constraining Policy Updates Using the KL Divergence
Daniel Palenicek
-
Daniel PalenicekDaniel Palenicek
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Functional Clipping Approach for Policy Optimization Algorithms

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access