Abstract

N UMEROUS fields of science and engineering present the problem of uncertainty propagation through nonlinear dynamic systems [1]. One may be interested in the determination of the response of engineering structures: beams/plates/entire buildings under random excitation (in structure mechanics [2]), the motion of particles under the influence of stochastic force fields (in particle physics [3]), or the computation of the prediction step in the design of a Bayesian filter (in filtering theory) [4,5], among others. All these applications require the study of the time evolution of the probability density function (PDF) p t;x , corresponding to the state x of the relevant dynamic system. The PDF is given by the solution to the Fokker–Planck–Kolmogorov equation (FPE), which is a partial differential equation (PDE) in the PDF of the system, defined by the underlying dynamic system’s parameters. In this paper, approximate solutions of the FPE are considered, and subsequently leveraged for the design of controllers for nonlinear stochastic dynamic systems. In the past few years, the authors have developed a generalized multiresolution meshless finite element method (FEM) methodology, partition of unity FEM (PUFEM), using the recently developed global local orthogonal mappings methodology to provide the partition of unity functions [6] and the orthogonal local basis functions for the solution of the FPE. The PUFEM is a Galerkin projection method, and the solution is characterized in terms of a finite dimensional representation of the Fokker–Planck operator underlying the problem. Themethodology is also highly amenable to parallelization [7–10]. Though the FPE is invaluable in quantifying the uncertainty evolution through nonlinear systems, perhaps its greatest benefit may be in the stochastic analysis, design, and control of nonlinear systems. In the context of nonlinear stochastic control, Markov decision processes (MDPs) have long been one of the most widely used methods for discrete time stochastic control. However, the dynamic programming (DP) equations underlying the MDPs suffer from the curse of dimensionality [11–13]. Various approximate dynamic programming methods have been proposed in the past several years for overcoming the curse of dimensionality [13–17], and can be broadly categorized under the category of functional reinforcement learning. These methods are essentially a model-free method of approximating the optimal control policy in stochastic optimal control problems. These methods generally fall under the category of value function approximation methods [13], policy gradient/approximation methods [15,16], and actor-critic methods [14,17]. These methods attempt to reduce the dimensionality of the DP problem through a compact parametrization of the value functions (with respect to a policy or the optimal function) and the policy function. The difference in the methods is mainly through the parametrization that is employed to achieve the aforementioned goal, and range from nonlinear function approximators such as neural networks [14] to linear approximation architectures [13,18]. These methods learn the optimal policies by repeated simulations on a dynamic system and thus can take a long time to converge to a good policy, especially when the problem has continuous state and control spaces. In contrast, the methodology proposed here is model-based and uses the finite dimensional representation of the underlying parametrized diffusion operator to parametrize both the value function as well as the control policy in the stochastic optimal control problem. Considering the low-order finite dimensional controlled diffusion operator allows us to significantly reduce the dimensionality of the planning problem,while providing a computationally efficient recursive method for obtaining progressively better control policies. The literature on computational methods for solving continuous time stochastic control problems is relatively sparse when compared to the discrete time problem. One of the approaches is through the use of locally consistent Markov decision processes [19]. In this approach, the continuous controlled diffusion operator is approximated by a finite dimensional Markov chain that satisfies certain local consistency conditions, namely that its drift and diffusion coefficients match that of the original process locally. The resulting finite state MDP is solved by standard DP techniques, such as value iteration and policy iteration. The method relies on a finite difference discretization and thus can be computationally very intensive in higher-dimensional spaces. In another approach [20,21], the diffusion process is approximated by a finite dimensional Markov chain through the application of generalized cell-to-cell mapping [22]. However, even this method suffers from the curse of dimensionality because it involves discretizing the state space into a grid, which becomes increasingly infeasible as the dimension of the system grows. Also, finite difference and finite element methods have been applied directly to the nonlinear Hamilton–Jacobi–Bellman (HJB) partial differential equation [23,24]. The method proposed here differs in that it uses policy iteration in the original infinite dimensional function space, along with a finite dimensional representation of the controlled diffusion operator to solve the problem. Considering a lower-order approximation of the underlying operator results in a significant reduction in dimensionality of the computational problem. Using the policy iteration algorithm typically results in having to solve a sequence of a few linear equations (typically less than five) before practical convergence is obtained, as opposed to solving a high-dimensional nonlinear equation if the original nonlinear HJB equation is solved. The literature for solving deterministic optimal control problems in continuous time is relatively mature when compared to its stochastic counterpart. The method of successive approximations/ Received 13 February 2008; revision received 10 December 2008; accepted for publication 16 December 2008. Copyright © 2009 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved. Copies of this paper may be made for personal or internal use, on condition that the copier pay the $10.00 per-copy fee to theCopyright Clearance Center, Inc., 222RosewoodDrive,Danvers,MA01923; include the code 0731-5090/ 09 $10.00 in correspondence with the CCC. ∗Ph.D. Student, Department of Aerospace Engineering; mrinal@neo. tamu.edu. http://people.tamu.edu/~mrinal. Student Member AIAA. Assistant Professor, Faculty of Aerospace Engineering. Member AIAA. Royce E. Weisenbaker Distinguished Professor, Aerospace Engineering. Fellow AIAA. JOURNAL OF GUIDANCE, CONTROL, AND DYNAMICS Vol. 32, No. 3, May–June 2009

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call