Beyond backpropagate through time: Efficient model‐based training through time‐splitting

Jiaxin Gao,Yang Guan,Fei Ma,Jianfeng Zheng,Wenyu Li,Shengbo Eben Li,Bo Zhang,Keqiang Li,Junqing Wei

doi:10.1002/int.22928

Abstract

Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Beyond backpropagate through time: Efficient model‐based training through time‐splitting

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Systems

Lead the way for us

Similar Papers

The control parameterization enhancing transform for constrained optimal control problems
K L Teo ... H W J Lee
The Journal of the Australian Mathematical Society. Series B. Applied Mathematics | VOL. 40
K L Teo, et. al.K L Teo ... H W J Lee
01 Jan 1998
The Journal of the Australian Mathematical Society. Series B. Applied Mathematics | VOL. 40

Cooperative Formation Controller Design for Time-Delay and Optimality Problems

-

01 Jan 2014
01 Jan 2014

Human Wealth Evolution is An Accelerating Expansion Underpinned by a Decelerating Optimization Process
Paolo Sibani ... Per Lyngs Hansen
SSRN Electronic Journal | VOL. -
Paolo Sibani, et. al.Paolo Sibani ... Per Lyngs Hansen
01 Jan 2021
SSRN Electronic Journal | VOL. -

On the optimal control problems with characteristic time control constraints
Changjun Yu ... Shuxuan Su
Journal of Industrial & Management Optimization | VOL. 18
Changjun Yu, et. al.Changjun Yu ... Shuxuan Su
27 Jan 2021
Journal of Industrial & Management Optimization | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Beyond backpropagate through time: Efficient model‐based training through time‐splitting

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Systems