Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several critical strategies for practical implementation of RL are developed, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies. A successful real-time implementation of RL is reported. The first step is a meta-heuristic first principles model parameter optimization, where a custom pseudo random binary signal (PRBS) is used to obtain open-loop experimental data. This is followed by in silico asynchronous advantage actor–critic (A3C/A-A2C) based policy learning. In the second step, three different approaches (namely proximal learning, single trajectory learning, and multiple trajectory learning) are utilized to explore the state/action space. In the final step, online learning (A2C) using the best in silico policy on the real process using a socket connection is established. The extent of exploration (EoE, a measure of exploration) is proposed as a parameter for quantifying exploration of the state/action space. While the online sample efficiency of RL application is enhanced, a soft constraint based constrained learning is proposed and validated. With considerations of the proposed strategies, this work demonstrates the possibility of applying RL to solve practical control problems.