A DEEP CONVOLUTIONAL AUTO-ENCODER WITH POOLING – UNPOOLING LAYERS IN CAFFE

Volodymyr Turchenko,Artur Luczak,Eric Chalmers

doi:10.47839/ijc.18.1.1270

Abstract

This paper presents the development of several models of a deep convolutional auto-encoder in the Caffe deep learning framework and their experimental evaluation on the example of MNIST dataset. We have created five models of a convolutional auto-encoder which differ architecturally by the presence or absence of pooling and unpooling layers in the auto-encoder’s encoder and decoder parts. Our results show that the developed models provide very good results in dimensionality reduction and unsupervised clustering tasks, and small classification errors when we used the learned internal code as an input of a supervised linear classifier and multi-layer perceptron. The best results were provided by a model where the encoder part contains convolutional and pooling layers, followed by an analogous decoder part with deconvolution and unpooling layers without the use of switch variables in the decoder part. The paper also discusses practical details of the creation of a deep convolutional auto-encoder in the very popular Caffe deep learning framework. We believe that our approach and results presented in this paper could help other researchers to build efficient deep neural network architectures in the future.

Highlights

An auto-encoder (AE) model is based on an encoder-decoder paradigm, where an encoder first transforms an input into a typically lowerdimensional representation, and a decoder is tuned to reconstruct the initial input from this representation through the minimization of a cost function [1,2,3,4]
We have created five deep convolutional auto-encoder (CAE) models in Caffe, denoted in Table 1, as follows: Model 1 (Fig. 2), notation, contains two convolutional layers followed by two fully-connected layers in the encoder part and, inversely, one fully-connected layer followed by two deconvolution layers in the decoder part
We have included it because similar idea were implemented in the Neon deep learning framework [41]; Model 3 (Fig. 3), notation (conv, pool, sw deconv, sw, unpool), contains two pairs of convolutional and pooling layers followed by two fullyconnected layers in the encoder part and, inversely, one fully-connected layer followed by two pairs of deconvolution and unpooling layers WITH the use of switch variables in the decoder part

Summary

Introduction

An auto-encoder (AE) model is based on an encoder-decoder paradigm, where an encoder first transforms an input into a typically lowerdimensional representation, and a decoder is tuned to reconstruct the initial input from this representation through the minimization of a cost function [1,2,3,4]. AEs and unsupervised learning methods have been widely used in many scientific and industrial applications, mainly solving tasks like network pre-training, feature extraction, dimensionality reduction, and clustering. A classic or shallow AE has only one hidden layer which is a lower-dimensional representation of the input. The revolutionary success of deep neural network (NN) architectures has shown that deep AEs with many hidden layers in the encoder and decoder parts are the state-of-the-art models in unsupervised learning. A deep AE can extract hierarchical features by its hidden layers and, substantially improve the quality of solving specific task. Deep CAEs may be better suited to image processing tasks because they fully utilize the properties of convolutional neural networks (CNNs), which have been proven to provide better results on noisy, shifted (translated) and corrupted image data [6]

Objectives

Methods

Results

Discussion

Conclusion