Abstract
In any convolutional neural network (CNN), there are hyperparameters - parameters that are not configured during training, but are set at the time of building the СNN model. Their choice affects the quality of the neural network. To date, there are no uniform rules for setting parameters. Hyperparameters can be adjusted fairly accurately using manual tuning. There are also automatic methods for optimizing hyperparameters. Their use reduces the complexity of the neural network tuning, and does not require experience and knowledge of hyperparameter optimization. The purpose of this article is to analyze automatic methods for selecting hyperparameters to reduce the complexity of the process of tuning a CNN. Optimization methods. Several automatic methods for selecting hyperparameters are considered: grid search, random search, modelbased optimization (Bayesian and evolutionary). The most promising are methods based on a certain model. These methods are used in the absence of an expression for the objective optimization function, but it is possible to obtain its observations (possibly with noise) for the selected values. Bayesian theory involves finding a trade-off between exploration (suggesting hyperparameters with high uncertainty that can give a noticeable improvement) and use (suggesting hyperparameters that are likely to work as well as what she has seen before – usually values that are very close to those observed before). Evolutionary optimization is based on the principle of genetic algorithms. A combination of hyperparameter values is taken as an individual of a population, and recognition accuracy on a test sample is taken as a fitness function. By crossing, mutation and selection, the optimal values of the neural network hyperparameters are selected. The authors have proposed a hybrid method, the algorithm of which combines Bayesian and evolutionary optimization. At the beginning, the neural network is tuned using the Bayesian method, then the first generation in the evolutionary method is formed from the N best options of parameters, which further continues the neural network tuning. An experimental study of the optimization of hyperparameters of a convolutional neural network by Bayesian, evolutionary and hybrid methods is carried out. In the process of optimization by the Bayesian method, 112 different architectures of the convolutional neural network were considered, the root-mean-square error on the validation set of which ranged from 1629 to 11503. As a result, the CNN with the smallest error was selected, the RMSE of which on the test data was 55. At the beginning of evolutionary optimization, they were randomly 8 different CNN architectures were generated with the root mean square error on the validation data from 2587 to 3684. In the process of optimization by this method, within 14 generations, CNNs were obtained with new sets of hyperparameters, the error on the validation data of which decreased to values from 1424 to 1812. As a result, the CNN with the smallest error was selected, the RMSE of which was 48 on the test data. The hybrid method combines the advantages of both methods and allows finding an architecture no worse than the Bayesian and evolutionary methods. When optimizing by this method, the optimal architecture of the CNN was obtained (the architecture in which the CNN on the validation data has the smallest root-mean-square error), the RMSE of which on the test data was 49. The results show that the quality of optimization for all three methods is approximately the same. Bayesian approach considers the entire hyperparameter space. To obtain greater accuracy with the Bayesian method, you need to increase the CNN optimization time with this method. The evolutionary algorithm selects the best combinations of hyperparameters from the initial population, so the initially generated population plays a big role. In addition, due to the peculiarities of the algorithm, this method is prone to falling into a local extremum. However, this algorithm is well parallelized, so the optimization process with this method can be accelerated. The hybrid method combines the advantages of both methods and allows you to find an architecture that is no worse than Bayesian and evolutionary methods. The experiments carried out show that the considered optimization methods on problems similar to the one considered will achieve approximately the same quality of neural network tuning with a relatively small size of the CNN. The presented results make it possible to choose one of the considered methods for optimizing hyperparameters when developing a CNN, based on the specifics of the problem being solved and the available resources.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.