_ This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 200271, “Dual Heuristic Dynamic Programming in the Oil and Gas Industry for Trajectory Tracking Control,” by Seaar Al-Dabooni, Basra Oil Company; Alaa Azeez Tawiq, Technical Institute of Basra; and Hussen Alshehab, Basra Oil Company. The paper has not been peer reviewed. _ The complete paper presents an artificial intelligence (AI) algorithm the authors call dual heuristic dynamic programming (DHDP) that is used to solve optimization-control problems. Fast, self-learning control based on DHDP is illustrated for trajectory-tracking levels on a quadruple tank system (QTS) consisting of four tanks and two electrical pumps with two pressure-control valves. Two artificial neural networks are constructed for the DHDP approach: the critic network (the provider of a critique or evaluated signals) and the actor network or controller (the provider of control signals). The DHDP controller is learned without human intervention. Approximate Dynamic Programming (ADP) Recently, many different types of artificial algorithms have been applied in petroleum fields to solve optimization problems. This complete paper introduces a new field of AI applicable to oil and gas, ADP. ADP is a useful tool to overcome the behavior of nonlinear systems and is a special algorithm of reinforcement learning (RL). The authors write that ADP can be viewed as consisting of three categories: heuristic dynamic programming (HDP), DHDP, and globalized HDP. ADP features two neural networks—an actor and a critic—to provide an optimal control signal and long-cost value, respectively. ADP has numerous applications. The complete paper references work that discusses control on turbo-generator and swarm-robot problems by use of DHDP and that illustrates that action-dependent HDP can obtain an optimal path by multirobot navigation. QTS is frequently used in the oil and gas industry. DHDP is used to control the voltage of the two pumps to follow the desired level (set-point-level value) of the tanks, an approach that can learn by itself (a self-learning controller). The complete paper devotes several pages to equations and parameters that describe HDP. In ADP, optimal control problems are solved, thereby allowing agents to select an optimal action to minimize a long-term cost value through solution of Bellman’s equation. RL and ADP are used to train the actor neural network to provide optimal actions based on minimizing the cost-to-go value produced from the critic network. The actor function approximator is denoted for the actor neural network. After full training of these networks, the optimal action values are obtained from the actor network. System Functionality The equipment receives the system states of the process through sensors, and the algorithm maximizes the reward by selecting the correct optimal action (control signal) to feed the equipment. The simulation results for applying DHDP with QTS as a benchmark test problem were obtained using MATLAB. QTS is illustrated as an example in the paper because QTS is widely used in most petroleum exploration or production fields as an entire system or in parts. Another reason for the authors’ choice of QTS as a test problem is that QTS features a difficult model to control that has a limited zone of operating parameters to be stable. The multi-input/multioutput (MIMO) model of QTS was a similar model as most MIMO devices in the oil and gas field. The overall learning-control-system performance was tested and compared with HDP and a well-known industrial controller, a proportional integral derivative (PID) using MATLAB programming. The simulation results of DHDP provide enhanced performance compared with the PID approach, with a 98.9002% improvement.