Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Juan Yan,Huibin Yang

doi:10.2174/1874129001610010069

Abstract

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of two-wheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get self-balanced very quickly.

Highlights

The two-wheeled self-balancing robot [1] is an important research topic in intelligent developmental robots
The self-balancing of twowheeled self-balancing robot is controlled by its inner development mechanism, and is reinforced by the intelligence according to communications with external environment by sensors and executors [3]
Aiming at the self-balancing of two-wheeled robot, researches have proposed a lot of control approaches

Summary

Introduction

The two-wheeled self-balancing robot [1] is an important research topic in intelligent developmental robots. The self-balancing of twowheeled self-balancing robot is controlled by its inner development mechanism, and is reinforced by the intelligence according to communications with external environment by sensors and executors [3]. Aiming at the self-balancing of two-wheeled robot, researches have proposed a lot of control approaches. The above algorithms for self-balancing in two-wheeled robots are all based on neural networks, and they have the advantage of high fault-tolerance. Their disadvantages are weak learning ability and sensitive to external noise, so the controller is hard to reach a stable status

Objectives

Methods

Conclusion