Abstract
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q-function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on neural networks and learning systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.