Abstract

An efficient convolution kernel could transform images into more expressive representations, which usually determines the quality of image classification models. Algorithms in the field of hyper-parameter optimization and neural architecture search could be applied to mining optimal kernels, while hyper-parameter optimization algorithms always require multiple training sessions to evaluate hyper-parameter configurations, resulting in a high time consumption. Recent gradient-based architecture search algorithms could deal with such problems within a single training session. However, there are currently no gradient-based NAS algorithms that could be used to efficiently optimize convolution kernels. This paper proposes a novel gradient-based algorithm, SoftStep, which could precisely and efficiently fine-tune the hyper-parameters of kernels. The proposed SoftStep is applied for the optimization of the convolution kernel’s key hyper-parameters and generalized to more hyper-parameters in deep learning. In experiments, architecture search tasks on multiple challenging datasets are compared to various state-of-art hyper-parameter optimization and architecture search algorithms. The reported results show that SoftStep could mine optimal kernels far more efficiently than other SOTA methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call