Value Function Approximation Research Articles

ABSTRACTModel predictive control (MPC) is a well‐developed method capable of handling complex control tasks. Implementation of an MPC requires solution of a deterministic finite horizon nonlinear optimal control (OC) problem. OC problems can be solved globally, explicitly expressing the optimal policy (and value function) as a function of the present state, or, locally, using online trajectory optimization, generating a solution that is only valid for the present state. For nonlinear problems however, neither is possible analytically, and the latter, finding a local solution online, is usually preferred. The difficulty with online trajectory optimization is that the solution must be available within a single sampling period. The main parameter affecting the computational demand of MPC is the length of the prediction horizon. The goal in this work is to reduce the length of the prediction horizon of a model predictive controller (MPC) to reduce computation time whilst preserving optimality guarantees. To this end, we propose approximation of the trajectory optimization problem for MPC by learning a finite horizon value function. The approximated value function is inserted into a truncated trajectory optimization problem so that the MPC can be attained with a reduced prediction horizon, ergo reduced computational load. By sampling the value function in a goal oriented way, we show that an effective approximate value function can be found by including both the function value and the gradients of the value function. The result is an accurate approximate MPC which leverages learning methodologies to reduce computational cost while still accounting for constraints.

Read full abstract

As the success of ride-sharing mobility service providers shows, customer demand for shared mobility services is increasing. The availability of mobile devices enables the constant accessibility of mobility apps and the immediate placement of transport requests. To provide such a dynamic dial-a-ride service, an effective control of the fleet is necessary. One promising solution approach is the value function approximation (VFA), which on the one hand convinces through good performance, but on the other hand also stands out through fast response times for a request. Training a VFA can be a challenging task since, among other things, the dimensionality of the state space plays a decisive role. If many variables to describe a state are used, a high amount of information can produce good performance after completion of the learning process. If the state space is too high-dimensional, there is also a risk that the method will not be able to find a reasonable solution. In contrast, if the number of variables is reduced, the learning speed can be accelerated, but the eventual performance may suffer from the associated loss of information. Furthermore, not all variables are equally relevant, as they contain different amounts of information. This paper presents a hybrid strategy, temporarily lowering the dimensionality of the problem using dimension reduction methods and subsequently increasing it by mapping the lower-dimensional state representations back onto a high-dimensional state space in order to exploit the advantages of both space dimensionalities. VFA in itself results in competitive performance for the dynamic dial-a-ride problem with shared rides. The proposed hybrid state representation can outperform the reference state representations by 3%, which corresponds to a meaningful acceleration in VFA learning speed.

Read full abstract

Value Function Approximation Research Articles

Related Topics

Articles published on Value Function Approximation

Off-Policy Temporal Difference Learning with Bellman Residuals

Information-Theoretic Generalization Bounds for Batch Reinforcement Learning.

A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests

Model-free robust reinforcement learning via Polynomial Chaos

Fast Nonlinear Model Predictive Control Combining Online Trajectory Optimization and Value Function Regression

Drive force allocation control based on pump displacement optimization for hydraulic hybrid motor-assisted system

Adaptive Dynamic Programming with Reinforcement Learning on Optimization of Flight Departure Scheduling

Moment-Based Reinforcement Learning for Ensemble Control.

Decentralized Adaptive temporal-difference learning over time-varying networks and its finite-time analysis

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Variational quantum circuit learning-enabled robust optimization for AI data center energy control and decarbonization

Novel Discounted Adaptive Critic Control Designs With Accelerated Learning Formulation.

Optimal feedback control of dynamical systems via value-function approximation

Derivative-free bound-constrained optimization for solving structured problems with surrogate models

Accelerating value function approximations for dynamic dial-a-ride problems via dimensionality reductions

Asymptotic Methods for Transaction Costs

Parameterized Projected Bellman Operator

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Balancing resources for dynamic vehicle routing with stochastic customer requests

Reinforcement learning‐based event‐triggered optimal control for unknown nonlinear systems with input delay

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Value Function Approximation Research Articles

Related Topics

Articles published on Value Function Approximation

Off-Policy Temporal Difference Learning with Bellman Residuals

Information-Theoretic Generalization Bounds for Batch Reinforcement Learning.

A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests

Model-free robust reinforcement learning via Polynomial Chaos

Fast Nonlinear Model Predictive Control Combining Online Trajectory Optimization and Value Function Regression

Drive force allocation control based on pump displacement optimization for hydraulic hybrid motor-assisted system

Adaptive Dynamic Programming with Reinforcement Learning on Optimization of Flight Departure Scheduling

Moment-Based Reinforcement Learning for Ensemble Control.

Decentralized Adaptive temporal-difference learning over time-varying networks and its finite-time analysis

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Variational quantum circuit learning-enabled robust optimization for AI data center energy control and decarbonization

Novel Discounted Adaptive Critic Control Designs With Accelerated Learning Formulation.

Optimal feedback control of dynamical systems via value-function approximation

Derivative-free bound-constrained optimization for solving structured problems with surrogate models

Accelerating value function approximations for dynamic dial-a-ride problems via dimensionality reductions

Asymptotic Methods for Transaction Costs

Parameterized Projected Bellman Operator

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Balancing resources for dynamic vehicle routing with stochastic customer requests

Reinforcement learning‐based event‐triggered optimal control for unknown nonlinear systems with input delay