Contrastive Divergence Learning May Diverge When Training Restricted Boltzmann Machines

Igel Christian

doi:10.3389/conf.neuro.10.2009.14.121

Abstract

Event Abstract Back to Event Contrastive Divergence Learning May Diverge When Training Restricted Boltzmann Machines Asja Fischer1, 2 and Christian Igel1, 2* 1 Ruhr-Universität Bochum, Bernstein Center for Computational Neuroscience, Germany 2 Ruhr-Universität Bochum, Institut für Neuroinformatik, Germany Understanding and modeling how brains learn higher-level representations from sensory input is one of the key challenges in computational neuroscience and machine learning. Layered generative models such as deep belief networks (DBNs) are promising for unsupervised learning such representations, and new algorithms that operate in a layer-wise fashion make learning these models computationally tractable [1-5]. Restricted Boltzmann Machines (RBMs) are the typical building blocks for DBN layers. They are undirected graphical models, and their structure is a bipartite graph connecting input (visible) and hidden neurons. Training large undirected graphical models by likelihood maximization in general involves averages over an exponential number of terms, and obtaining unbiased estimates of these averages by Markov chain Monte Carlo methods typically requires many sampling steps. However, recently it was shown that estimates obtained after running the chain for just a few steps can be sufficient for model training [3]. In particular, gradient-ascent on the k-step Contrastive Divergence (CD-k), which is a biased estimator of the log-likelihood gradient based on k steps of Gibbs sampling, has become the most common way to train RBMs [1-5]. Contrastive Divergence learning does not necessarily reach the maximum likelihood estimate of the parameters (e.g., because of the bias). However, we show that the situation is much worse. We demonstrate empirically that for some benchmark problems taken from the literature [6], CD learning systematically leads to a steady decrease of the log-likelihood after an initial increase (see supplementary Figure 1). This seems to happen especially when trying to learn more complex distributions, which are the targets if RBMs are used within DBNs. The reason for the decreasing log-likelihood is an increase of the model parameter magnitudes. The estimation bias depends on the mixing rate of the Markov chain, and it is well-known that mixing slows down with growing magnitude of model parameters [1,3]. Weight-decay can therefore solve the problem if the strength of the regularization term is adjusted correctly. If chosen too large, learning is not accurate enough. If chosen too small, learning still divergences. For large k, the effect is less pronounced. Increasing k, as suggested in [1] for finding parameters with higher likelihood, may therefore prevent divergence. However, divergence occurs even for values of k too large to be computationally tractable for large models. Thus, a dynamic schedule to control k is needed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Computational Neuroscience	Publication Date: Jan 1, 2009
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Contrastive Divergence Learning May Diverge When Training Restricted Boltzmann Machines

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computational Neuroscience

Lead the way for us

Similar Papers

Improving mixing rate with tempered transition for learning restricted Boltzmann machines
Jungang Xu ... Shilong Zhou
Neurocomputing | VOL. 139
Jungang Xu, et. al.Jungang Xu ... Shilong Zhou
08 Apr 2014
Neurocomputing | VOL. 139

Population-Contrastive-Divergence: Does consistency help with RBM training?
Oswin Krause ... Christian Igel
Pattern Recognition Letters | VOL. 102
Oswin Krause, et. al.Oswin Krause ... Christian Igel
02 Dec 2017
Pattern Recognition Letters | VOL. 102

Maximum reconstruction probability training of Restricted Boltzmann machines with auxiliary function approach
Norihiro Takamune ... Hirokazu Kameoka
-
Norihiro Takamune, et. al.Norihiro Takamune ... Hirokazu Kameoka
01 Sep 2014
01 Sep 2014

Training Restricted Boltzmann Machines With a D-Wave Quantum Annealer
Vivek Dixit ... Sabre Kais
Frontiers in Physics | VOL. 9
Vivek Dixit, et. al.Vivek Dixit ... Sabre Kais
29 Jun 2021
Frontiers in Physics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Contrastive Divergence Learning May Diverge When Training Restricted Boltzmann Machines

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computational Neuroscience