Analysis of the Layers in Convolutional Neural Network in the Context of Text Recognition

Grisales Victor,Ramirez Ricardo,Mauricio Rivera Diego,Vladimir Pe�A Mauricio,Rodr�Guez Carol

doi:10.17485/ijst/2018/v11i31/130149

Abstract

Objectives: To analyze the layers in Convolutional Neural Network in the context of text recognition looking for interpretations. Methods/Analysis: Through the training of a deep Convolutional Neural Network and its application to the recognition of numerical characters from the MNIST dataset, the characteristics of deep architectures are studied and analyzed. Making a detailed study of the behavior of the different weights and their significance through the training of the network using - images, error values and gradient values which characterize each of the layers. Findings: After the training it is observed that the convolution layers have a possible interpretation. Results were obtained from the images of the MNIST dataset after going through the convolution layers with images and random filters. However, the most representative results are achieved by viewing a single image using random filters. Improvement: Recommendations for design and implementation based on the example and other references are presented. Keywords: Artificial Neural Network, Convolutional Neural Network, Text recognition

Highlights

Models of Convolutional Neural Networks have applied different types of deep network arrays to make different types of classification using different techniques: Designing and training a complete deep network with a set of specific inputs to the classification, taking an architecture already trained with available standards for different research groups in deep learning and retraining the network according to the new set of images, starting from the assumption that the pre-trained weights are approximate to those that must be obtained with the retraining or final tuning
Related applications on deep convolutional neural network architectures are described: cifarNet, AlexNet and GoogLeNet1. on the use of AlexNet[2] it uses as input the set of ImageNet images, they use five convolution layers with activation layers type Rectified Linear Unit (ReLU) and with pooling and a set of 3 completely connected layers being the last of 1000 units according to the number of classes, this arrangement completes a total of 60 million parameters and 650000 neurons, and test errors of the order of 15% are obtained
Results were obtained from the images of the MNIST dataset after going through the convolution layers with images and random filters

Summary

Introduction

Models of Convolutional Neural Networks have applied different types of deep network arrays to make different types of classification using different techniques: Designing and training a complete deep network with a set of specific inputs to the classification, taking an architecture already trained with available standards for different research groups in deep learning and retraining the network according to the new set of images, starting from the assumption that the pre-trained weights are approximate to those that must be obtained with the retraining or final tuning.related applications on deep convolutional neural network architectures are described: cifarNet, AlexNet and GoogLeNet1. on the use of AlexNet[2] it uses as input the set of ImageNet images, they use five convolution layers with activation layers type Rectified Linear Unit (ReLU) and with pooling and a set of 3 completely connected layers being the last of 1000 units according to the number of classes, this arrangement completes a total of 60 million parameters and 650000 neurons, and test errors of the order of 15% are obtained. Models of Convolutional Neural Networks have applied different types of deep network arrays to make different types of classification using different techniques: Designing and training a complete deep network with a set of specific inputs to the classification, taking an architecture already trained with available standards for different research groups in deep learning and retraining the network according to the new set of images, starting from the assumption that the pre-trained weights are approximate to those that must be obtained with the retraining or final tuning. On the use of AlexNet[2] it uses as input the set of ImageNet images, they use five convolution layers with activation layers type Rectified Linear Unit (ReLU) and with pooling and a set of 3 completely connected layers being the last of 1000 units according to the number of classes, this arrangement completes a total of 60 million parameters and 650000 neurons, and test errors of the order of 15% are obtained.

Objectives

Methods

Results

Conclusion