This paper deals with a practical application of the Q-learning algorithm as a general-purpose self-improving controller operating in a class of industrial closed-loop control systems. The proposed approach solves a practical control engineering problem by designing a controller that starts its operation with the predefined closed-loop control performance and then, without unnecessarily disturbing the normally operated closed-loop system, it learns on-line from interactions with the controlled process to ensure the gradual improvement in the closed loop control performance until this performance reaches a desired user-defined level. It is proposed how to ensure its initial performance by an appropriate initialization of the Q-matrix without assuming any knowledge about the process model. The desired target closed-loop performance is defined by the first-order reference trajectory that should be preserved by the closed-loop system, which requires determining the current state of the closed-loop system by the values of the control error and its time derivative. The novelty of the proposed approach results from: (i) preserving the first order reference trajectory with reduced dimensions of the Q-matrix to two, which significantly reduces memory and computational requirements without any loss of generality, (ii) proposing a method for a reduction of the number of states designed directly for the proposed approach, and (iii) proposing a method for a very convenient initialization of the Q-matrix based only on the tunings of the existing PI controller. All these features bring the proposed approach closer towards practical applications. Results of the simulation and experimental validation show that the proposed Q-learning controller can substitute the existing PI controller bumplessly and provides a gradual improvement in the closed loop performance due to its online learning abilities.
Read full abstract