Abstract
The performance of a convolutional neural network (CNN) heavily depends on its hyperparameters. However, finding a suitable hyperparameters configuration is difficult, challenging, and computationally expensive due to three issues, which are 1) the mixed-variable problem of different types of hyperparameters; 2) the large-scale search space of finding optimal hyperparameters; and 3) the expensive computational cost for evaluating candidate hyperparameters configuration. Therefore, this article focuses on these three issues and proposes a novel estimation of distribution algorithm (EDA) for efficient hyperparameters optimization, with three major contributions in the algorithm design. First, a hybrid-model EDA is proposed to efficiently deal with the mixed-variable difficulty. The proposed algorithm uses a mixed-variable encoding scheme to encode the mixed-variable hyperparameters and adopts an adaptive hybrid-model learning (AHL) strategy to efficiently optimize the mixed-variables. Second, an orthogonal initialization (OI) strategy is proposed to efficiently deal with the challenge of large-scale search space. Third, a surrogate-assisted multi-level evaluation (SME) method is proposed to reduce the expensive computational cost. Based on the above, the proposed algorithm is named s urrogate-assisted hybrid-model EDA (SHEDA). For experimental studies, the proposed SHEDA is verified on widely used classification benchmark problems, and is compared with various state-of-the-art methods. Moreover, a case study on aortic dissection (AD) diagnosis is carried out to evaluate its performance. Experimental results show that the proposed SHEDA is very effective and efficient for hyperparameters optimization, which can find a satisfactory hyperparameters configuration for the CIFAR10, CIFAR100, and AD diagnosis with only 0.58, 0.97, and 1.18 GPU days, respectively.
Highlights
C ONVOLUTIONAL neural network (CNN), as one of the most efficient deep learning models [1], plays a vastly important role in various artificial intelligence applications like Go playing [2]
As the resolution of computerized tomography (CT) images and various aortic dissection (AD) shapes can bring in great classification difficulties, this classification problem is suitable for testing the CNNs obtained by surrogate-assisted hybrid-model EDA (SHEDA)
The total data are randomly split into the training dataset with 3486 images and the test dataset with 387 images, which are about 90% and 10% of the total data size, respectively
Summary
C ONVOLUTIONAL neural network (CNN), as one of the most efficient deep learning models [1], plays a vastly important role in various artificial intelligence applications like Go playing [2]. Recent studies have started to consider more intelligent, automatic, and efficient ways of obtaining better CNN models, which result in the CNN optimization researches [11], i.e., consider finding the best CNN hyperparameters as an optimization problem and design powerful algorithms to solve it. In this direction, many algorithms have been proposed and obtained promising results [12], such as using reinforcement learning [11], Bayesian optimization [12], and. Solving the CNN hyperparameters optimization problem is still challenging due to the following three difficulties, including the mixed-variable problem, large-scale search space, and expensive computational cost.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have