Recently, solving the optimization-control problems by using artificial intelligence has widelyappeared in the petroleum fields in exploration and production. This paper presents the stateof-the-art reinforcement-learning algorithm applying in the petroleum optimization-controlproblems, which is called a direct heuristic dynamic programming (DHDP). DHDP has twointeractive artificial neural networks, which are the critic network (provider acritique/evaluated signal) and the actor network (provider a control signal). This paper focuseson a generic on-line learning control system in Markov decision process principles.Furthermore, DHDP is a model-free learning design that does not require prior knowledgeabout a dynamic model; therefore, DHDP can be appllied with any petroleum equipment ordevise directly without needed to drive a mathematical model. Moreover, DHDP learns byitself (self-learning) without human intervention via repeating the interaction between anequipment and environment/process. The equipment receives the states of theenvironment/process via sensors, and the algorithm maximizes the reward by selecting thecorrect optimal action (control signal). A quadruple tank system (QTS) is taken as a benchmarktest problem, that the nonlinear model responses close to the real model, for three reasons:First, QTS is widely used in the most petroleum exploration/production fields (entire system orparts), which consists of four tanks and two electrical-pumps with two pressure control valves.Second, QTS is a difficult model to control, which has a limited zone of operating parametersto be stable; therefore, if DHDP controls on QTS by itself, DHDP can control on otherequipment in a fast and optimal manner. Third, QTS is designed with a multi-input-multioutput (MIMO) model for analysis in the real-time nonlinear dynamic system; therefore, theQTS model has a similar model with most MIMO devises in oil and gas field. The overalllearning control system performance is tested and compared with a proportional integralderivative (PID) via MATLAB programming. DHDP provides enhanced performancecomparing with the PID approach with 99.2466% improvement.
Read full abstract