A theory of local learning, the learning channel, and the optimality of backpropagation

Pierre Baldi,Peter Sadowski

doi:10.1016/j.neunet.2016.07.006

Pierre Baldi, Peter Sadowski

Open Access

PDF Available

https://doi.org/10.1016/j.neunet.2016.07.006

Copy DOI

Export

Save

Cite

Journal: Neural Networks	Publication Date: Aug 5, 2016
Citations: 74	License type: publisher-specific-oa

Affiliation: University of California, Irvine

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input–output functions, even when targets are available for the top layer. Learning complex input–output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. For any learning algorithm, the capacity of the learning channel can be defined as the number of bits provided about the error gradient per weight, divided by the number of required operations per weight. We estimate the capacity associated with several learning algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost. This result is also shown to be true for recurrent networks, by unfolding them in time. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far.

Full Text