SummaryIn the field of optimal control for continuous nonlinear systems, function approximation methods are often employed to overcome the curse of dimensionality. Compared to other global function approximators like neural networks, multivariate splines can be easily evaluated and adapted on a local basis with linearity in the parameters. In this work, a multivariate spline based reinforcement learning (RL) strategy is proposed for solving the continuous‐time nonlinear control problem. Based on the classic value iteration method, multivariate splines are integrated into RL algorithms to approximate continuous value functions and policy functions from discrete action and value samples. Hence, the determined splines with updated coefficients can be utilized in continuous control of nonlinear systems. In the simulation experiment, the performance of the spline‐based RL control is evaluated in controlling an under‐actuated inverted pendulum. The proposed method is compared with the value iteration based discrete control strategy and the neural network based continuous control strategy. The simulation results indicate that the proposed method based on multivariate splines has better control performance with less state oscillations, energy consumption and convergence time in comparison with discrete value iteration and neural network based RL, and the adoption of simplex splines improves the function approximation efficiency with less computation time than neural network optimization.
Read full abstract