Abstract

Reinforcement learning (RL) is promising for online controller optimization. However, its practical application has been hindered by safety issues. This paper proposes an algorithm named Incremental Q-learning (IQ) and applies it to the online optimization of motor speed synchronization control. IQ ensures safe learning by adopting so-called incremental action variables which represent incremental change rather than absolute magnitude, and dividing the one-round learning process in the classic Q-learning (in this paper referred to as Absolute Q-learning, AQ) into multiple consecutive ones with the Q table getting reset at the beginning of each round. Since the permitted interval of change is restricted to be very small, the agent can learn its way safely, steadily, and robustly towards the optimal policy. Simulation results show that IQ is advantageous to AQ in optimality, safety, and adaptability. IQ converges to better final performances with significantly smaller performance variance along the whole learning process, smaller torque trajectory deviation between consecutive episodes and adapts to unknown disturbances faster. It is of great potential for online controller optimization/tuning in practical engineering projects. Source code and demos are provided.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.