Abstract

The industrial adoption of Reinforcement Learning (RL) is currently hindered by an RL agent's potentially erratic learning behaviour, black-box decision-making, and the need for problem-specific tuning. Considering these shortcomings, an Actor-Critic RL agent is applied within the context of behavioural cloning where an initial policy estimate (the warm-starting policy) is provided before RL adaptation. A PI controller is equivalent to a deterministic policy acting on the state space defined by the error and integral error; thus, a PI controller may be used as the warm-starting policy. Two policy types are proposed: the additive- and the emulating policy type. The additive policy was generated by adding the output from a neural network to the plane defined by a PI controller in the state-action space. The emulating policy was constructed by fitting a neural network to emulate the same PI controller before policy adaptation through RL. The results illustrate inverting gradients to be potentially useful as a safeguard against policy divergence resulting from valve saturation combined with inaccurate state-value estimates. It is further shown that performance improvements can be obtained while maintaining desired properties of the original policy (i.e., PI controller) through application of the Actor-Critic algorithm. Finally, RL is contextualised as an inherent connection between classical and optimal control. By constructing a PI control policy describing the selection of the control input as a function of feedback error and the integral thereof, a graphical representation of the subsequent RL adaptations made is obtained, thus enabling interpretability of the resulting RL controller.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.