Abstract
Restricted Boltzmann machines (RBMs) and their extensions, often called “deep-belief networks”, are powerful neural networks that have found applications in the fields of machine learning and artificial intelligence. The standard way to train these models resorts to an iterative unsupervised procedure based on Gibbs sampling, called “contrastive divergence”, and additional supervised tuning via back-propagation. However, this procedure has been shown not to follow any gradient and can lead to suboptimal solutions. In this paper, we show an efficient alternative to contrastive divergence by means of simulations of digital memcomputing machines (DMMs) that compute the gradient of the log-likelihood involved in unsupervised training. We test our approach on pattern recognition using a modified version of the MNIST data set of hand-written numbers. DMMs sample effectively the vast phase space defined by the probability distribution of RBMs, and provide a good approximation close to the optimum. This efficient search significantly reduces the number of generative pretraining iterations necessary to achieve a given level of accuracy in the MNIST data set, as well as a total performance gain over the traditional approaches. In fact, the acceleration of the pretraining achieved by simulating DMMs is comparable to, in number of iterations, the recently reported hardware application of the quantum annealing method on the same network and data set. Notably, however, DMMs perform far better than the reported quantum annealing results in terms of quality of the training. Finally, we also compare our method to recent advances in supervised training, like batch-normalization and rectifiers, that seem to reduce the advantage of pretraining. We find that the memcomputing method still maintains a quality advantage (>1% in accuracy, corresponding to a 20% reduction in error rate) over these approaches, despite the network pretrained with memcomputing defines a more non-convex landscape using sigmoidal activation functions without batch-normalization. Our approach is agnostic about the connectivity of the network. Therefore, it can be extended to train full Boltzmann machines, and even deep networks at once.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.