Biasing restricted Boltzmann machines using Gaussian filters to learn invariant visual features

Arjun Yogeswaran,Pierre Payeur

doi:10.1109/ssci.2016.7850122

Abstract

Advances in unsupervised learning have allowed the efficient learning of feature representations directly from large sets of unlabeled data instead of using traditional handcrafted features. However, improving algorithms to increase the quality of these representations in the absence of labeled data is still an area of active research. This paper evaluates visual features learned through unsupervised learning, specifically comparing regularization and preprocessing methods using Gaussian filters on a single-layer network. Using the restricted Boltzmann machine as the unsupervised learning mechanism, features emerging through training on natural videos, with different biasing and preprocessing based on Gaussian filters, are compared by metrics to measure invariance as well as classification performance on standard datasets. When Gaussian filters are convolved with adjacent hidden layer activations from a single example during training, topographies begin to emerge where adjacent features become tuned to slightly varying stimuli. 1D, 2D, and 3D topographies are compared. When a Gaussian low-pass filter is applied to activations from a single hidden node across frames drawn from video, features that are more invariant to transformations are produced. Finally, when Gaussian filters are applied to the visible nodes, images become blurrier; learning from these images also leads to invariant features. The networks are trained using the Hollywood2 video dataset, and tested on image classification of the static CIFAR-10 and STL-10 datasets. To prove that the improvements are independent of the dataset, the networks are shown to produce similar results when trained on the CIFAR-10 dataset. The induction of topography or simple image blurring via Gaussian filters during training produce better discriminative features as evidenced by the consistent and notable increase in classification results that they produce. Also, in the visual domain, invariant features are desirable such that objects can be classified accurately despite transformations. It is found that most of the compared methods produce more invariant features, however, classification accuracy does not correlate to invariance.

Full Text