This study was intended to describe multilayer perceptrons (MLP), Hopfield’s associative memories (HAM), and restricted Boltzmann machines (RBM) from a unified point of view. Despite of mutual relation between three models, for example, RBMs have been utilizing to construct deeper architectures than shallower MLPs. The energy function in HAM is analogous to the Ising model in statistical mechanics, and it connects microscopic physics to thermodynamics. The canonical partition function Z in the Boltzmann distribution is also utilized RBMs. Asynchronous updating and contrastive divergence (CD) based upon Gibbs sampling is also related. Therefore, it seems to be worth considering these three models within a common framework. This attempt might lead to “one algorithm hypothesis.”, which insists that our brains might rule a single but universal rule. An algorithm, which someone could find out in a region, may be applicable to other regions. Multilayer perceptrons (henceforth, MLP) are feed forward models for pattern recognition and classification. Hopfield proposed another kind of neural network models for associative memory and optimization (HAM). Hiton adopted the restricted Boltzmann machines (RBM) in “Deep Learning” in order to construct deeper layered neural networks. The energy employed in RBMs are elicited the generalized EM algorithm, which was closely related to the energy employed by HAM. In spite of other various differences, see Table 1, it is worth considering to compare among them. At least, an attempt is worth attempting to explain all of them in a unified terminology. HAM and RBM have symmetrically weighted connections, wij = wji, although generalized Boltzmann machines can not satisfy this constraints. Similarly, there are no feedback connections in MLP in general. When we denote a connection weight from j-th unit to i-th unit as wij, wij ∈ R, wji = 0 in MLP. When we consider a merged weight matrix W, all the models can be considered as identical. The construction methods adopted by Deep Learning are based upon RBMs. One of key concepts to success for constructing multilayer deep architecture is the non–linearity, because units in hidden layer in RBMs are binary. The non–linearity seems to play an important role to construct deep architecture. When we suppose to abandon CD and binary feature, multilayer architecture might replace one weight matrix W = W1W2... Wp. Also, we can consider a thought experiment with only one hidden unit in RBM. If h = 0, then there are no meanings at all. If h = 1, then it must be an identity mapping, or at least, it might be extract the eigenvector vector corresponded to the maximum eigenvalue value in data matrix X. This
Read full abstract