An optimization method for the inverted pendulum problem based on deep reinforcement learning

Shenghan Zhu,Siling Feng,Bo Sun,Song Liu,Mengxing Huang

doi:10.1088/1742-6596/2296/1/012008

Shenghan Zhu, Siling Feng + Show 3 more

Open Access

https://doi.org/10.1088/1742-6596/2296/1/012008

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Jun 1, 2022
Citations: 2	License type: cc-by

Affiliation: Hainan University

Abstract

The inverted pendulum problem is a classical problem. The inverted pendulum starting at a random position keeps moving upwards and aims to reach an upright position. The problem has been solved through some methods based on deep reinforcement learning (DRL) such as Deep Deterministic Policy Gradient (DDPG). However, DDPG also has disadvantages. Deterministic policy is not conducive to action exploration. Moreover, the Q value needs to be estimated reasonably accurately for the policy to be accurate. Nevertheless, at the beginning of the learning, there is a certain difference in the Q value estimation, and the parameters learned at this time are easy to deviate. Therefore, this paper combining AdaBound with DDPG algorithm proposes an optimization method for the inverted pendulum problem, and compares the performance with that of four published baselines. The experimental results show that for the inverted pendulum problem, the proposed method outperforms the above four baselines to a certain extent.

Full Text