Abstract

The inverted pendulum problem is a classical problem. The inverted pendulum starting at a random position keeps moving upwards and aims to reach an upright position. The problem has been solved through some methods based on deep reinforcement learning (DRL) such as Deep Deterministic Policy Gradient (DDPG). However, DDPG also has disadvantages. Deterministic policy is not conducive to action exploration. Moreover, the Q value needs to be estimated reasonably accurately for the policy to be accurate. Nevertheless, at the beginning of the learning, there is a certain difference in the Q value estimation, and the parameters learned at this time are easy to deviate. Therefore, this paper combining AdaBound with DDPG algorithm proposes an optimization method for the inverted pendulum problem, and compares the performance with that of four published baselines. The experimental results show that for the inverted pendulum problem, the proposed method outperforms the above four baselines to a certain extent.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.