Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures.

Xiyu Shi,Erhan Ekmekcioglu,Ahmet Kondoz,Varuna De-Silva,Yusuf Aslan

doi:10.3390/e24010067

Abstract

Deep learning has proven to be an important element of modern data processing technology, which has found its application in many areas such as multimodal sensor data processing and understanding, data generation and anomaly detection. While the use of deep learning is booming in many real-world tasks, the internal processes of how it draws results is still uncertain. Understanding the data processing pathways within a deep neural network is important for transparency and better resource utilisation. In this paper, a method utilising information theoretic measures is used to reveal the typical learning patterns of convolutional neural networks, which are commonly used for image processing tasks. For this purpose, training samples, true labels and estimated labels are considered to be random variables. The mutual information and conditional entropy between these variables are then studied using information theoretical measures. This paper shows that more convolutional layers in the network improve its learning and unnecessarily higher numbers of convolutional layers do not improve the learning any further. The number of convolutional layers that need to be added to a neural network to gain the desired learning level can be determined with the help of theoretic information quantities including entropy, inequality and mutual information among the inputs to the network. The kernel size of convolutional layers only affects the learning speed of the network. This study also shows that where the dropout layer is applied to has no significant effects on the learning of networks with a lower dropout rate, and it is better placed immediately after the last convolutional layer with higher dropout rates.

Highlights

The fields of Artificial Intelligence (AI) and Machine Learning (ML) have been developing rapidly over recent years
The results further demonstrate that information theoretic tools can provide insight into the training procedure of the Convolutional Neural Networks (CNNs)
The MNIST and Fashion-MNIST datasets were used in training different model setups

Summary

Introduction

The fields of Artificial Intelligence (AI) and Machine Learning (ML) have been developing rapidly over recent years. As a result of this success, deep learning models have been used in various application areas such as criminal justice, medicine and finance [6]. Deep learning models usually contain millions of parameters and functions Humans cannot understand this representation and the relations of the parameters, and cannot physically interpret the results of models. The quantification aspect of information theory, which is utilised in this paper, relates to measuring information related to distributions as based on probability and statistics. It was initially proposed and developed by Claude Shannon for communication system design [31,32,33]. Entropy is an average quantity that depicts how much information an event or random variable contains

Objectives

Methods

Results

Conclusion