Learn Control Policies Research Articles

This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the ‘curse-of-dimensionality’ of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. ‘cost-to-go’ data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose.

Read full abstract

The current shift in chemical industry interest away from high throughput production towards small amounts of high value added products has increased the industry's awareness of the issues associated with product and process performance. This fact poses a unique set of challenging control problems characterized by their ‘ill-definedness’, high process nonlinearities and imperfect modelling. In this work, a novel methodology is proposed for incremental learning of a control policy that can continuously improve product quality and operating performance.The new concept introduced here is the notion of a performance function that implicitly includes end-product quality constraints as a process goal and operational preferences which describe different modes of operation. Since plant information is often scarce and expensive to obtain, it is proposed that the performance function be learned by integrating together batch-to-batch data, intra-run measurements and a predictive process model. A new framework for this integration, called coarse code generalization, is proposed which revolves around the generation of an artificial set of batch runs using a hybrid process model. In this model, learning biases are incorporated through background knowledge that expresses run outcome sensitivities with respect to states and actions. Artificial batch runs provide an augmented data set which is used for inductive learning of the performance function. With a minimum amount of information on process performance available, a first approximation to the performance function is constructed and an optimization program is used to define a near-optimal control policy. As more plant data become available, the performance function refinement procedure permits also an increasing refinement of the learned control policy. Recipe changes that increase process performance can then be implemented on-line. A semi-batch reactor where an autocatalytic reaction takes place is used to present and test the methodology. Results obtained demonstrate that the methodology can cope successfully with the problems of both imperfect modelling and scarce information which are typical of the industrial environment. Also, coarse code generalization for performance control proves to be an ideal means of integrating inductive learning with first-principles models and other types of domain knowledge.

Read full abstract

Learn Control Policies Research Articles

Related Topics

Articles published on Learn Control Policies

Predictive feature selection for genetic policy search

Local Models for data-driven learning of control policies for complex systems

Learning parametric dynamic movement primitives from multiple demonstrations

Convergence Analysis of Ant Colony Learning

A novel method for learning policies from variable constraint data

Closed-loop learning of visual control policies

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

Learning to Control the Performance of Batch Processes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Learn Control Policies Research Articles

Related Topics

Articles published on Learn Control Policies

Predictive feature selection for genetic policy search

Local Models for data-driven learning of control policies for complex systems

Learning parametric dynamic movement primitives from multiple demonstrations

Convergence Analysis of Ant Colony Learning

A novel method for learning policies from variable constraint data

Closed-loop learning of visual control policies

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

Learning to Control the Performance of Batch Processes