Abstract

In this study, we verified the performance of the swing-up control method for a reaction wheel pendulum using the actor-critic algorithm in both simulation and experiment and suggested the possibility that reinforcement learning, using shallow neural networks, can be applied to studying intelligent robots that act in real-world environments, such as a robot that teaches itself to walk through trial and error. The actor of the proposed actor-critic algorithm used the policy network to determine the rotational direction of the reaction wheel based on the angular position and velocity of the pendulum and the angular velocity of the reaction wheel. The critic used the value network to estimate the expected reward based on the same factors as the actor’s. In both simulation and in the real-world environment, through trial and error, the proposed algorithm successfully learned how to swing up and stabilize the pendulum by choosing the rotational direction ‒ between the clockwise and counter-clockwise directions ‒ of the reaction wheel.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call