Proximal policy optimization with model-based methods

Shuailong Li,Yuquan Leng,Huiwen Zhang,Wei Zhang,Xin Zhang

doi:10.3233/jifs-211935

Abstract

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Proximal policy optimization with model-based methods

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Journal: Journal of Intelligent & Fuzzy Systems	Publication Date: Apr 28, 2022
Citations: 1

Similar Papers

Traffic Navigation for Urban Air Mobility with Reinforcement Learning
Jaeho Lee ... Hyochoong Bang
-
Jaeho Lee, et. al.Jaeho Lee ... Hyochoong Bang
30 Sep 2022
30 Sep 2022

Policy Optimization with Model-Based Explorations
Feiyang Pan ... Hualin He
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Feiyang Pan, et. al.Feiyang Pan ... Hualin He
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Implementing action mask in proximal policy optimization (PPO) algorithm
Cheng-Yen Tang ... Chien-Hung Liu
ICT Express | VOL. 6
Cheng-Yen Tang, et. al.Cheng-Yen Tang ... Chien-Hung Liu
20 May 2020
ICT Express | VOL. 6

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework
Guang Zhan ... Xinmiao Zhang
Drones | VOL. 6
Guang Zhan, et. al.Guang Zhan ... Xinmiao Zhang
04 Jul 2022
Drones | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proximal policy optimization with model-based methods

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems