A unified framework for gradient algorithms used for filter adaptation and neural network training

Sylvie Marcos,Gérard Dreyfus,Odile Macchi,Léon Personnaz,Christophe Vignat,Pierre Roussel-Ragot

doi:10.1002/cta.4490200205

Sylvie Marcos, Gérard Dreyfus + Show 4 more

PDF Available

https://doi.org/10.1002/cta.4490200205

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractIn this paper we present in a unified framework the gradient algorithms employed in the adaptation of linear time filters (TF) and the supervised training of (non‐linear) neural networks (NN). the optimality criteria used to optimize the parameters H of the filter or network are the least squares (LS) and least mean squares (LMS) in both contexts. They respectively minimize the total or the mean squares of the error e(k) between an (output) reference sequence d(k) and the actual system output y(k) corresponding to the input X(k). Minimization is performed iteratively by a gradient algorithm. the index k in (TF) is time and it runs indefinitely. Thus iterations start as soon as reception of X(k) begins. the recursive algorithm for the adaptation H(k – 1) → H(k) of the parameters is implemented each time a new input X(k) is observed. When training a (NN) with a finite number of examples, the index k denotes the example and it is upper‐bounded. Iterative (block) algorithms wait until all K examples are received to begin the network updating. However, K being frequently very large, recursive algorithms are also often preferred in (NN) training, but they raise the question of ordering the examples X(k).Except in the specific case of a transversal filter, there is no general recursive technique for optimizing the LS criterion. However, X(k) is normally a random stationary sequence; thus LS and LMS are equivalent when k becomes large. Moreover, the LMS criterion can always be minimized recursively with the help of the stochastic LMS gradient algorithm, which has low computational complexity.In (TF), X(k) is a sliding window of (time) samples, whereas in the supervised training of (NN) with arbitrarily ordered examples, X(k – 1) and X(k) have nothing to do with each other. When this (major) difference is rubbed out by plugging a time signal at the network input, the recursive algorithms recently developed for (NN) training become similar to those of adaptive filtering. In this context the present paper displays the similarities between adaptive cascaded linear filters and trained multilayer networks. It is also shown that there is a close similarity between adaptive recursive filters and neural networks including feedback loops.The classical filtering approach is to evaluate the gradient by ‘forward propagation’, whereas the most popular (NN) training method uses a gradient backward propagation method. We show that when a linear (TF) problem is implemented by an (NN), the two approaches are equivalent. However, the backward method can be used for more general (non‐linear) filtering problems. Conversely, new insights can be drawn in the (NN) context by the use of a gradient forward computation.The advantage of the (NN) framework, and in particular of the gradient backward propagation approach, is evidently to have a much larger spectrum of applications than (TF), since (i) the inputs are arbitrary and (ii) the (NN) can perform non‐linear (TF).

Full Text