The industrial adoption of Reinforcement Learning (RL) is currently hindered by an RL agent's potentially erratic learning behaviour, black-box decision-making, and the need for problem-specific tuning. Considering these shortcomings, an Actor-Critic RL agent is applied within the context of behavioural cloning where an initial policy estimate (the warm-starting policy) is provided before RL adaptation. A PI controller is equivalent to a deterministic policy acting on the state space defined by the error and integral error; thus, a PI controller may be used as the warm-starting policy. Two policy types are proposed: the additive- and the emulating policy type. The additive policy was generated by adding the output from a neural network to the plane defined by a PI controller in the state-action space. The emulating policy was constructed by fitting a neural network to emulate the same PI controller before policy adaptation through RL. The results illustrate inverting gradients to be potentially useful as a safeguard against policy divergence resulting from valve saturation combined with inaccurate state-value estimates. It is further shown that performance improvements can be obtained while maintaining desired properties of the original policy (i.e., PI controller) through application of the Actor-Critic algorithm. Finally, RL is contextualised as an inherent connection between classical and optimal control. By constructing a PI control policy describing the selection of the control input as a function of feedback error and the integral thereof, a graphical representation of the subsequent RL adaptations made is obtained, thus enabling interpretability of the resulting RL controller.