Abstract
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.