Evolving Optimal Convolutional Neural Networks

Subhashis Banerjee,Sushmita Mitra

doi:10.1109/ssci47803.2020.9308201

Abstract

Among the different Deep Learning (DL) models, the deep Convolutional Neural Networks (CNNs) have demonstrated impressive performance in a variety of image recognition or classification tasks. Although CNNs do not require feature engineering or manual extraction of features at the input level, yet designing a suitable CNN architecture necessitates considerable expert knowledge involving enormous amount of trial-and-error activities. In this paper we attempt to automatically design a competitive CNN architecture for a given problem while consuming reasonable machine resource(s) based on a modified version of Cartesian Genetic Programming (CGP). As CGP uses only the mutation operator to generate offsprings it typically evolves slowly. We develop a new algorithm which introduces crossover to the standard CGP to generate an optimal CNN architecture. The genotype encoding scheme is changed from integer to floating-point representation for this purpose. The function genes in the nodes of the CGP are chosen as the highly functional modules of CNN. Typically CNNs use convolution and pooling, followed by activation. Rather than using each of them separately as a function gene for a node, we combine them in a novel way to construct highly functional modules. Five types of functions, called ConvBlock, average pooling, max pooling, summation, and concatenation, were considered. We test our method on an image classification dataset CIFAR10, since it is being used as the benchmark for many similar problems. Experiments demonstrate that the proposed scheme converges fast and automatically finds the competitive CNN architecture as compared to state-of-the-art solutions which require thousands of generations or GPUs involving huge computational burden.

Full Text