Abstract

The behavior of a linear computing unit is analyzed during learning by gradient descent of a cost function equal to the sum of a variance maximization and a weight normalization term. The landscape of this cost function is shown to be composed of one local maximum, a set of saddle points, and one global minimum aligned with the principal components of the input patterns. It is possible to describe the cost landscape in terms of the hyperspheres, hypercrests, and hypervalleys associated with each of these principal components. Using this description, it is possible to show that the learning trajectory will converge to the global minimum of the landscape under certain conditions of the starting weights and learning rate of the descent procedure. Furthermore, it is possible to provide a precise description of the learning trajectory in this cost landscape. Extensions and implications of the algorithm are discussed by using networks of such cells.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call