The roots of neural-network backpropagation (BP) may be traced back to classical optimal-control gradient procedures developed in early 1960s. Hence, BP can directly apply to a general discrete N-stage optimal control problem that consists of N stage costs plus a terminal state cost. In this journal (Liao in Neural Process Lett 10:195–200, 1999), given such a multi-stage optimal control problem, Liao has turned it into a problem involving a terminal state cost only (via classical transformation), and then claimed that BP on the transformed problem leads to new recurrent neural network learning. The purpose of this paper is three-fold: First, the classical terminal-cost transformation yields no particular benefit for BP. Second, two simulation examples (with and without time lag) demonstrated by Liao can be regarded naturally as deep feed-forward neural-network learning rather than as recurrent neural-network learning from the perspective of classical optimal-control gradient methods. Third, BP can readily deal with a general history-dependent optimal control problem (e.g., involving time-lagged state and control variables) owing to Dreyfus’s 1973 extension of BP. Throughout the paper, we highlight systematic BP derivations by employing the recurrence relation of nominal cost-to-go action-value functions based on the stage-wise concept of dynamic programming.
Read full abstract