Abstract

A novel backstepping control scheme based on reinforcement fuzzy Q-learning is proposed for the control of container cranes. In this control scheme, the modified backstepping controller can handle the underactuated system of a container crane. Moreover, the gain of the modified backstepping controller is tuned by the reinforcement fuzzy Q-learning mechanism that can automatically search the optimal fuzzy rules to achieve a decrease in the value of the Lyapunov function. The effectiveness of the applied control scheme was verified by a simulation in Matlab, and the performance was also compared with the conventional sliding mode controller aimed at container cranes. The simulation results indicated that the used control scheme could achieve satisfactory performance for step-signal tracking with an uncertain lope length.

Highlights

  • A robotic container crane is a robot that lifts the cargo off the ground with ropes and carries the cargo to the designated locations

  • The target is usually to maximize the cumulative rewards or minimize the cumulative costs over the entire learning process. e entire process of typical reinforcement learning can be described as the following. e learning process starts by the agent adopting an action in the initial state based on the current policy, and the adopted action will transfer the system from the current state to the state with certain probability

  • An action transferring the system from the current state to the state will be evaluated with the reward or cost that is called the instant reward or cost. e offered instant rewards/costs of each action from all the visited states can be further used to dynamically explore the optimal policy of adopting actions that can achieve the maximum of rewards or the minimum of costs over the entire process, which can be completed by many temporal difference (TD) methods such as Q-learning [16] and SARSA [17]

Read more

Summary

Introduction

A robotic container crane is a robot that lifts the cargo off the ground with ropes and carries the cargo to the designated locations. E offered instant rewards/costs of each action from all the visited states can be further used to dynamically explore the optimal policy of adopting actions that can achieve the maximum of rewards or the minimum of costs over the entire process, which can be completed by many temporal difference (TD) methods such as Q-learning [16] and SARSA [17]. Q-learning mechanism that obtains the optimal fuzzy rules outputting the appropriate control gains in the applied controller after the appropriate learning process, which can reduce the value of Lyapunov function and achieve the convergence of tracking errors. E rest of this paper is organised as follows: in Section 2, a nonlinear dynamics model of robotic container cranes is established by the Lagrangian method.

The Design of Reinforcement Learning-Based Backstepping Controller
Simulation Result
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call