Abstract
An efficient convolution kernel could transform images into more expressive representations, which usually determines the quality of image classification models. Algorithms in the field of hyper-parameter optimization and neural architecture search could be applied to mining optimal kernels, while hyper-parameter optimization algorithms always require multiple training sessions to evaluate hyper-parameter configurations, resulting in a high time consumption. Recent gradient-based architecture search algorithms could deal with such problems within a single training session. However, there are currently no gradient-based NAS algorithms that could be used to efficiently optimize convolution kernels. This paper proposes a novel gradient-based algorithm, SoftStep, which could precisely and efficiently fine-tune the hyper-parameters of kernels. The proposed SoftStep is applied for the optimization of the convolution kernel’s key hyper-parameters and generalized to more hyper-parameters in deep learning. In experiments, architecture search tasks on multiple challenging datasets are compared to various state-of-art hyper-parameter optimization and architecture search algorithms. The reported results show that SoftStep could mine optimal kernels far more efficiently than other SOTA methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have